Archive for the ‘P2P’ Category

Simple file verification

Friday, March 28th, 2008

Simple file verification (SFV) is a file format for storing CRC32 checksums of files in order to verify the integrity of files. SFV can be used to detect random corruptions in a file, but cannot be used for checking authenticity in any meaningful way. Typically, the .sfv extention is used on SFV files.Files can become corrupted for a variety of reasons including: faulty storage media, errors in transmission, write errors during copying or moving, software bugs and so on. SFV verification ensures that a file has not been corrupted by comparing the file’s CRC hash value to a previously calculated value. Due to the nature of hash functions, hash collisions may result in false negatives, but the likelihood of collisions is usually negligible with random corruption.SFV cannot be used to verify the authenticity of files, as CRC32 is not a collision resistant hash function; even if the hash sum file is not tampered with, it is computationally trivial for an attacker to cause deliberate hash collisions, meaning that a malicious change in the file is not detected by a hash comparison. In cryptography, this attack is called a collision attack. For this reason, the md5sum and sha1sum utilities are often preferred in Unix operating systems, which use the MD5 and SHA-1 cryptographic hash functions respectively.

Even a single-bit error causes both SFV’s CRC and md5sum’s cryptographic hash to fail, typically requiring the entire file to be re-fetched from scratch. For this reason, the Parchive and rsync utilities are often preferred for verifying that a file has not been accidentally corrupted in transmission, since they can correct common small errors with a much shorter download.Despite above-mentioned weaknesses possessed by the SFV format, it is still a popular data verification technique. This is due to the relatively small amount of time taken by SFV utilities to calculate the CRC32 checksums, especially when compared to the time taken to calculate equivalent cryptographic hashes such as MD5 or SHA-1.One of the first programs to use the SFV format was WinSFV.SFV uses a plain text file containing one line for each file and its checksum in the format FILENAMECHECKSUM. Any line starting with a semicolon ‘;’ is considered to be a comment and is ignored for the purposes of file verification. The delimiter between the filename and checksum is always one or several spaces; tabs are never used.

Attacks on peer-to-peer networks

Friday, March 7th, 2008

Many peer-to-peer networks are under constant attack by people with a variety of motives.

Examples include:

* poisoning attacks (e.g. providing files whose contents are different from the description)
* polluting attacks (e.g. inserting “bad” chunks/packets into an otherwise valid file on the network)
* freeloaders (Sometimes known as ‘Leechers’) (users or software that make use of the network without contributing resources to it)
* insertion of viruses to carried data (e.g. downloaded or carried files may be infected with viruses or other malware)
* malware in the peer-to-peer network software itself (e.g. distributed software may contain spyware)
* denial of service attacks (attacks that may make the network run very slowly or break completely)
* filtering (network operators may attempt to prevent peer-to-peer network data from being carried)
* identity attacks (e.g. tracking down the users of the network and harassing or legally attacking them)
* spamming (e.g. sending unsolicited information across the network- not necessarily as a denial of service attack)

Most attacks can be defeated or controlled by careful design of the peer-to-peer network and through the use of encryption. P2P network defense is in fact closely related to the “Byzantine Generals Problem”. However, almost any network will fail when the majority of the peers are trying to damage it, and many protocols may be rendered impotent by far fewer numbers.

Computer science perspective

Friday, March 7th, 2008

Technically, a completely pure peer-to-peer application must implement only peering protocols that do not recognize the concepts of “server” and “client”. Such pure peer applications and networks are rare. Most networks and applications described as peer-to-peer actually contain or rely on some non-peer elements, such as DNS. Also, real world applications often use multiple protocols and act as client, server, and peer simultaneously, or over time. Completely decentralized networks of peers have been in use for many years: two examples are Usenet (1979) and FidoNet (1984).Many P2P systems use stronger peers (super-peers, super-nodes) as servers and client-peers are connected in a star-like fashion to a single super-peer.Sun added classes to the Java technology to speed the development of peer-to-peer applications quickly in the late 1990s so that developers could build decentralized real time chat applets and applications before Instant Messaging networks were popular. This effort is now being continued with the JXTA project.Peer-to-peer systems and applications have attracted a great deal of attention from computer science research; some prominent research projects include the Chord project, the PAST storage utility, the P-Grid, a self-organized and emerging overlay network and the CoopNet content distribution system (see below for external links related to these projects).

Legal controversy

Friday, March 7th, 2008

Peer-to-peer technologies are rarely considered in and of themselves to be illegal.However a frequent use of many peer-to-peer technologies is file sharing of materials without permission of the copyright owner, and this is illegal in most countries; unless a license has been granted for the exchanged files, which permits redistribution (such as GPL or GFDL), or if exchanged materials have entered the public domain.Other uses of peer-to-peer such as telephony are not typically nearly so controversial, although provision of telephony is restricted in some legal jurisdictions around the world.

Unstructured and structured P2P networks

Friday, March 7th, 2008

The P2P overlay network consists of all the participating peers as network nodes. There are links between any two nodes that know each other: i.e. if a participating peer knows the location of another peer in the P2P network, then there is a directed edge from the former node to the latter in the overlay network. Based on how the nodes in the overlay network are linked to each other, we can classify the P2P networks as unstructured or structured.An unstructured P2P network is formed when the overlay links are established arbitrarily. Such networks can be easily constructed as a new peer that wants to join the network can copy existing links of another node and then form its own links over time. In an unstructured P2P network, if a peer wants to find a desired piece of data in the network, the query has to be flooded through the network to find as many peers as possible that share the data. The main disadvantage with such networks is that the queries may not always be resolved. Popular content is likely to be available at several peers and any peer searching for it is likely to find the same thing, but if a peer is looking for rare data shared by only a few other peers, then it is highly unlikely that search will be successful. Since there is no correlation between a peer and the content managed by it, there is no guarantee that flooding will find a peer that has the desired data. Flooding also causes a high amount of signalling traffic in the network and hence such networks typically have very poor search efficiency. Most of the popular P2P networks such as Gnutella and FastTrack are unstructured.Structured P2P network employ a globally consistent protocol to ensure that any node can efficiently route a search to some peer that has the desired file, even if the file is extremely rare. Such a guarantee necessitates a more structured pattern of overlay links. By far the most common type of structured P2P network is the distributed hash table (DHT), in which a variant of consistent hashing is used to assign ownership of each file to a particular peer, in a way analogous to a traditional hash table’s assignment of each key to a particular array slot. Some well known DHTs are Chord, Pastry, Tapestry, CAN, and Tulip. Not a DHT-approach but a structured P2P network is HyperCuP.

Advantages of peer-to-peer networks

Friday, March 7th, 2008

An important goal in peer-to-peer networks is that all clients provide resources, including bandwidth, storage space, and computing power. Thus, as nodes arrive and demand on the system increases, the total capacity of the system also increases. This is not true of a client-server architecture with a fixed set of servers, in which adding more clients could mean slower data transfer for all users.The distributed nature of peer-to-peer networks also increases robustness in case of failures by replicating data over multiple peers, and — in pure P2P systems — by enabling peers to find the data without relying on a centralized index server. In the latter case, there is no single point of failure in the system.

Classifications of peer-to-peer networks

Friday, March 7th, 2008

Peer-to-peer networks can be classified by what they can be used for:

* file sharing
* telephony
* media streaming (audio, video)
* discussion forums

Other classification of peer-to-peer networks is according to their degree of centralization.

In ‘pure’ peer-to-peer networks:

* Peers act as equals, merging the roles of clients and server
* There is no central server managing the network
* There is no central router

Some examples of pure peer-to-peer application layer networks designed for file sharing are Gnutella and Freenet.

There also exist countless hybrid peer-to-peer systems:

* Has a central server that keeps information on peers and responds to requests for that information.
* Peers are responsible for hosting available resources (as the central server does not have them), for letting the central server know what resources they want to share, and for making its shareable resources available to peers that request it.
* Route terminals are used addresses, which are referenced by a set of indices to obtain an absolute address.

e.g.

* Centralized P2P network such as Napster
* Decentralized P2P network such as KaZaA
* Structured P2P network such as CAN
* Unstructured P2P network such as Gnutella
* Hybrid P2P network (Centralized and Decentralized) such as JXTA (an open source peer-to-peer protocol specification)