The Dark Web Explained – What lies beneath the Deep Web
DEEP WEB: The Darkness That Lies Beneath …
According to QI researchers, more than 90% of the Internet is comprised of spam, while less than 1% is pornography. One might have expected there to be far more nooky than Viagra adverts on the Web.
In truth, there is actually very little known about the ever-changing world that is the Web as new discoveries and developments are forever being brought to the table. In fact, it is almost impossible to even predict what the Internet will be like in ten years time, let alone the distant future.
There is, however, one quite interesting dark side of the Internet that has existed for some time, yet which very few people know about. This is something known as the Dark Net or Deep Web.
What is the Deep Web? How did it come about?
Once upon a time (during 1995) in Edinburgh University, an Irish teenager named Ian Clarke produced a thesis for his computer science course proposing a revolutionary new way for people to use the Internet without detection.
He called his project a “Distributed, Decentralised Information Storage and Retrieval System”. The idea was that by downloading Clarke’s unique software (which he intended to distribute for free) anyone could chat online, share files or set up a website with almost complete anonymity.
To cut a long story short, Clarke’s tutors weren’t too impressed, but this didn’t stop the student from going ahead with his project. He released his software, called Freenet, in 2000. Since then, at least two million copies of Freenet have been downloaded, which is also now readily available on several websites.
Entering the Realm of the Deep Web
After downloading the 10 MB file, installing the software takes barely a couple of minutes and requires minimal computer skills. Then you enter a previously hidden online world where you can find resources such as “The Terrorist’s Handbook: A practical guide to explosives and other things of interest to terrorists”. Freenet is also the portal to accessing pirated copies of books, games, movies, music, software, TV series and much more.
What perhaps started as a seemingly innocent project has today become a means for a plethora of online criminal activity. From creating and sharing viruses to accessing and distributing child pornography (all anonymously of course) the Deep Web has created a subculture of Internet users.
The Internet has always been associated with openness and is often labeled as the ultimate form of freedom; a place where free speech, free access and lack of censorship have prevailed. Yet where do we draw the line when it is simply becoming easier to engage in online criminal activity without been traced?
To put it into better perspective, the Dark Web has grown so fast that it is estimated to be at least 500 times larger than the surface web.
How is the Deep Web different from the Surface Web?
To put it very simply, the web is defined as a collection of hyperlinks that are indexed by search engines. In other words, the pages/content that appear when we do a Google search, is the Internet as we know it, and is called the Surface Web.
The Dark Web, also known as the deep web, invisible web, and dark net, consists of web pages and data that are beyond the reach of search engines. Some of what makes up the Deep Web consists of abandoned, inactive web pages; but the majority of data that lies within have been crafted to deliberately avoid detection in order to remain anonymous.
According to Wikipedia, Michael K. Bergman — who first coined the phrase “deep web”, describes how searching on the Internet today can be compared to dragging a net across the surface of the ocean. A great deal may be caught in the net, but there is a wealth of information that is deep and therefore missed.
In 2001, Bergman published a paper on the Deep Web that is still regularly cited today. “The Deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web,” he wrote.
“The Deep Web is the fastest growing category of new information on the internet … The value of Deep Web content is immeasurable … Internet searches are searching only 0,03% … of the [total web] pages available.” – Bergman
How deep does the dark net go?
No doubt the Internet has changed significantly in the past eight years, yet researchers today have only just begun the plunge to the depths of the Deep Web. The bottom line is that there is simply too much data available for any search engine to index the entire deep web.
Coupled with this issue is the deliberate use of invisible web space by individuals who do not want to be found. This is the origin of groups of criminals who sent out millions of spam e-mails suggesting that you have won the international lottery before quickly disconnecting. No matter what developments are made toward catching such crooks, they will always find new ways to remain hidden.
Craig Labovitz, chief scientist at Arbor Networks – a leading online security firm, was quoted in an article in the Guardian saying, “In 2000 dark and murky address space was a bit of a novelty,” says Labovitz. “This is now an entrenched part of the daily life of the Internet.”
“Defunct online companies; technical errors and failures; disputes between Internet service providers; abandoned addresses once used by the U.S. military in the earliest days of the Internet — all these have left the online landscape scattered with derelict or forgotten properties, perfect for illicit exploitation, sometimes for only a few seconds before they are returned to disuse … it just takes a PC and [an Internet] connection.” – Labovitz
Is there any light to the darkness?
Surely it was not young Ian Clarke’s vision to create a breeding ground for online criminals, which is sadly the predominant direction that the Deep Web seems to have taken. He merely wanted to offer free software to those seeking anonymous online communication.
There are secretive parts of the Internet that were specifically designed for U.S. secret service field agents and law enforcement officers to surf questionable websites and services without leaving tell-tale tracks. However, these merely seem to be more to the advantage of the crooks been sought after.
Perhaps the domain of the Dark Net would make sense in oppressive regimes such as China where the government goes to farcical extremes to censor images that contain large expanses of supposedly naked flesh. It could certainly have a positive impact in countries such as Iran — allowing people to rally support against oppressive governments without fear of being apprehended.
It’s a shuddering thought that due to the immense size and growth of the Deep Web there is virtually no way to stop it. It may not all be bad but there is a large enough criminal aspect to it to warrant concern. Clarke even admits that child pornography exists on Freenet, yet claims that it would be detrimental to try and put a stop to it.
“At Freenet we could establish a virus to destroy any child pornography on Freenet — we could implement that technically. But then whoever has the key [to that filtering software] becomes a target. Suddenly we’d start getting served copyright notices; anything suspect on Freenet, we’d get pressure to shut it down. To modify Freenet would be the end of Freenet.” – Ian Clarke
Perhaps for the meantime it’s safest to stick to Google.
Related articles & links:
- The Pirate Bay
- Ian Clarke’s blog
- The Freenet project
- The dark side of the Internet
the freenet download is only 10meg, not 200mb as stated
Thank you for pointing that out. The edit has been made
worked…well pages never download lol
The Internet of Things is likely to increase the depth exponentially, the technologies and paradigms are only now becoming mature enough for dev’s to go and get the data. Web services and mashups is how this stuff gets manifested.
Great post. Here is a good article that adds some additional detail to the topic and a good set of links to the deep web search engines and other helpful sites.
Thanks for the links guys
This article is full of flawed information. The deepweb and darknet are two totally different things. The deep web consists of internet pages that are not indexed by search engines, and these account for hundreds of times the amount of data as webpages that are indexed on search engines. And the VAST majority of the deep web is not criminal at all.
Darknets are networks of computers that share information between them, usually friend to friend. Meaning Alice trusts Bob, Bob trusts Carrol so Alice can talk to Carrol through Bob, etc. These networks are not websites and can not be indexed by search engines. Criminals use these sorts of private p2p darknets a lot, but they are primarily used by file sharers (criminal, but hardly). Once law enforcement infiltrates a darknet like this they can usually easily map out the majority of its members but they do struggle with darknets like this even.
Then there are anonymous networks. A lot of people use the terms darknet and anonymous networks interchangeably but this isn’t really appropriate. In an anonymous network there are nodes that users connect to before they connect to the target. For example, Tor hidden services work like this
Alice —> Node 1 —> Node 2 —> Node 3 —> Rendezvous Node <—– Node 4 <—– Node 5 <—– Node 6 Bob —> Carrol (even if Alice trusts bob and the data sent through him is encrypted). So I think it is silly to call both of them darknets, at least with out explaining the difference between a non anonymous darknet and an anonymous darknet.
Freenet combines the properties of a traditional darknet with the properties of an anonymous darknet.
Alice —> Bob —-> Carrol —> Node 1 —> Node 2 —> Node 3 —-> Doug —> Eren —> Frank
where alice trusts bob, bob trusts Carrol, the nodes are not trusted by anyone but allow the two darknets to bridge the gap so that Alice and Frank can talk, Doug trusts Eren and Eren trusts Frank.
correction on hidden services (sorry I don’t know how I typed that wrong)
Alice —> Node 1 —> Node 2 —> Node 3 —> Rendezvous Node <— Node 4 <—- Node 5 <—- Node 6 Bob —> Carrol
even if alice trusts bob, and the data sent through him is encrypted for carrol.
The fact of the matter is that law enforcement has trouble locating and taking down traditional darknets, but once they infiltrate them they can take down large amounts of the members.
When it comes to most anonymity networks (especially freenet I would say) it is going to be difficult for anyone other than advanced Military Signals Intelligence agencies, such as NSA, to trace communications, even after infiltration.
I am not sure why but no matter how I type the hidden service protocol out it seems to come out corrupted. I wont spam the forum trying to do it right, moderator please combine my posts together simply by replacing the hidden service with this:
Alice -> node 1 -> node 2 -> node 3 -> rendezvous <- node 4 ,- node 5 <- node 6 <- Bob
again sorry for the spam I don't know why its not showing up correctly its weird!
@anonymous: Thank you for your contributions there. I won’t call it clarity (apart from the darknet vs deepweb) because I honestly don’t understand that node business! But I’m hoping that others will and that they gain something from it.
I didn’t combine your comments because I find it quite interesting that your “hidden service protocol” isn’t showing up! That IS weird…
>it is going to be difficult for anyone other than advanced Military Signals Intelligence agencies, such as NSA, to trace communications, even after infiltration – Hectic! :)
Hello, I imagine the hidden service protocol did not show up because of some automatic format issue or something. > and – etc sometimes are not displayed as plain text but interpreted as formating.
I will try and explain the differences more clearly.
The anonymity network I am most versed in is Tor, not Freenet, but they work on similar principles. The Tor network consists of volunteer run computers, at the current time there are around 1,800 different computers that make up the network. These computers are called nodes , although technically any computer on a network is a node. The difference between Tor and a ‘basic’ darknet is that Tor selects three random nodes to send communications down before they reach the target (hidden services use up to 7, with 3-4 selected by sender and 3 selected by receiver). The communications are encrypted in layers
Alice — Node 1 — Node 2 — Node 3 — Bob
First Alice encrypts her message to node 3, then encrypts the ciphertext to node 2 and then encrypts that ciphertext to node 1. Alice encrypts node 2s address to node 1 and node 3s address to node 2, as well as the destination address to node 3. As the encrypted communications move down the network, layers are removed until the plain text exits from node 3 to the destination. Communications back follow the reverse path, also with layers of encryption. The nodes keep no logs unless they are malicious.
This is a different type of network than one where Alice, Bob, Carrol and Dan are members that form a web of trust:
Alice trusts Bob , Bob Trusts Carrol and Carrol trusts Dan.
Alice — Bob —- Carrol — Dan
The reason that it is different is because in Tor the nodes are “stupid”, they don’t know who is sending information over them or what the information is. If there is a criminal organization that FBI wants to infiltrate that is using Tor, compromising a single member does not accomplish anything they would need to focus their efforts on compromising Tor. If the organization is using the second model of darknet, compromising a single member will quickly lead to the compromise of the entire group:
Alice is compromised which leads to Bob which leads to Carrol which leads to Dan etc.
Alice — node 1 —- node 2 — node 3 —- Bob
Alice is compromised which leads to node 1, which is not a member in the organization, is at no legal liability for running as an onion router and kept no logs and only talked to node 2 anyways.
That is the big difference between the two sorts of darknet (anonymous and normal being the classifiers I suppose).
I think that law enforcement is highly dishonest when they answer journalists questions about technology such as Freenet or Tor by replying that they are aware of darknets and are able to compromise them. Sure, they are aware of darknets and can compromise them, but they are talking about darknets like Limewire supports, not military grade technology darknets like Tor (which was developed by the Navy and has strong ties to intelligence agencies).
Attacks on Tor are much more sophisticated than compromising a single node and mapping out the network. There are a few primary attacks:
The way the NSA can compromise Tor is via the fact that they have compromised exchange centers. For more information on this just google for NSA at&t exchange center and you will see the story. The internet looks like this (hopefully format doesn’t mess up, C represents a computer)
CCCCCCCCCC
^^^^^^^^^^^^^^
ISP ISP ISP ISP
^^^^^^^^^^^^^^
Exchange Point
Multiple computers share an ISP, and ISPs communicate between each other over exchange centers, also called peering points or simply IXes. In the USA (and likely other countries), most exchange points are under constant monitoring by NSA. Even if a Tor node does not log a connection, and even if the ISP doesn’t (which some do but most don’t), the connection data is still logged at the IX where the NSA has access to it. This sort of monitoring is only defeated by Mixes, an anonymity property that Tor does not have.
Tor nodes gather data and send it on as they receive it. Mixes gather data from hundreds of people and then re-order it internally (ISP and IX cant see this only someone who can see the internal state of the mix) before sending it out in re-ordered batches. This means even with ISP logging an IX logging, mixes provide anonymity (thats right, even NSA can’t trace heavily mixed traffic). Unfortunately it also means that it takes days for traffic to move from one end of the network to the other. Check out mixminion and mixmaster, those are two extremely anonymous mix networks for sending totally untraceable E-mail (if you can wait a week for your email to show up!)
Anyways sorry to go off topic. Another way that lesser agencies than NSA can trace Tor is to fill it with some amount of compromised nodes and try and own the enter and exit node in a circuit.
Alice — Node 1 — Node 2 — Node 3 — Bob
If node 1 and node 3 are compromised, the attacker can identify that Alice is sending traffic into the Tor network. They can identify node 2. Since they own node 3, and since Tor doesn’t mix, if node 3 gets traffic from node 2 right after node 1 sends traffic to node 2, the adversary can determine with very high accuracy that it is the same traffic. This allows for the attacker to for the most part identify Alice (there is some level of doubt but it is so small as to be essentially non-existent. Maybe node 2 dropped Alices traffic and forwarded on someone elses traffic from a different entry node? Possible, not likely).
Although a particularly smart attacker would also flood relay nodes that terminate a circuit if it detects node 1 and node 2 are not owned by it. This forces Alice to cycle through node combinations faster in the hopes that eventually the attacker will own her entry and exit node.
Alice — Node 1 — Compromised Node — Node 2 — Bob
The compromised node can tell node 1 and node 2 and if they are both not owned by the same owner of the compromised node, breaking the connection forces a new circuit and a new chance of compromise.
Federal agencies need to be smart if they consider trying to flood Tor with compromised nodes though. 1,800 nodes means there are 1,619,100 possible combinations of exit and entry node. If an attacker floods 400 compromised nodes into this (bringing the total node count up to 2,200) then there are 79,800 combinations of entry and exit they can do a timing attack on, but they increase the amount of total combinations to 2,418,900. And 2,418,900 – 79,800 = 2,339,100 – 1,619,100 = 720,000 additional combinations that the attacker can not trace! By increasing the amount of combinations the attacker can trace by 79,800 they increase the combinations they can not trace by 720,000.
Anyways sorry to get all boring and technical. Just want to drop by to once again say that no one short of NSA / Mossad / GHCQ is likely able to trace Tor. As a matter of fact FBI and Interpol are well documented as consistently failing to do so.
Bye!
Good grief man! Thank you for taking the time to share your essay with the rest of the group – it was quite a read :)
I’m hoping that it is useful to the more mathematically-minded. If not, you might find the article on The Pirate Bay a bit easier to digest …
I had my first online account in 195, had a six digit ICQ number. This article took me back to those days on the web.
195? AD? That’s a long time ago ;)