Elsewhere

Overview

Elsewhere is a Node.js project that aims to replicate part of the functionality of the Google’s now discontinued Social Graph API. When given the URL of a person’s website or social media profile (e.g. a Twitter account), it outputs a JSON-formatted list of the other websites and social media profiles that belong to that person. In other words, it can determine a person’s ‘social graph’ from a single URL in the graph.

Elsewhere can be set up as a web service, providing a JSON API that can be easily queried over a network. It can also be included as a Node module and used directly within a server-side project.

Try the Elsewhere explorer at example.elsewherejs.com

Download

Star the project on GitHub, or download it:

Elsewhere on GitHub Elsewhere v0.0.4

An example

How does it work?

To use Elsewhere, simply provide it with a URL. Elsewhere will use this target as the entry point to the graph and will search it for links that contain the attribute rel=me:

<a href="http://dharmafly.com" rel="me">Dharmafly</a>

The rel=me attribute is a microformat to assert that the link is to a website, page or resource that is owned by (or is about) the same person as the page the link is on.

For example, if you’ve given Elsewhere a URL that’s a Twitter profile, they usually contain a link to that person or company’s webpage; this link has the rel=me microformat.

When Elsewhere finds a rel=me link or links at a URL, it searches each of them for more, building a comprehensive graph along the way.

For example, a person’s Twitter profile page may link to his or her home page, which in turn links to their Last.fm, Flickr, Facebook, GitHub, LinkedIn, Google+ profiles etc.

Elsewhere can only search public profiles and webpages for links. If a page isn’t public, Elsewhere can’t search it for links. It’s also worth noting that profile owners deliberately place these links on their profiles to make them discoverable. If the profile owner has neglected to place a link there, Elsewhere won’t find one.

Once Elsewhere has run out of new rel=me links to search, it returns a list of all the URLs it has found. This list is what is referred to as the ‘social graph’, the owner of which being the owner of the URL you initially gave Elsewhere.

Elsewhere can make strict checks to verify that that each linked URL is indeed owned by the same person as the original site. After all, anyone could create a website, add a rel=me link to Elvis Presley’s website and claim to be him.

Elsewhere checks if the linked page itself has a rel=me link back to the original URL. If there is such a reciprocal link, then the relationship is deemed to be ‘verified’.

But Elsewhere is more sophisticated than that. The reciprocal link doesn’t have to be directly between the two sites. For example, if a Twitter account links to a GitHub account, which links to a home page, which links back to the Twitter account, then the relationship between the Twitter account and home page will be verified, even though the two don’t directly link to each other.

Elsewhere operates in non-strict mode by default, in which it will return both verified and unverified URLs. This mode is useful because many profile pages and personal websites lack rel=me links, making it difficult to verify those links and leading to many legitimate links being missed.

To be absolutely sure of the stated relationships, turn on strict mode (by setting the strict option to true) and only verified URLs will be returned.

URL shortners and redirects

When elsewhere follows a link and that link resolves to a different URL, that new resolved URL takes precedence over the original. For instance:

http://github.com/chrisnewtn -> https://github.com/chrisnewtn
http://t.co/vV5BWNxil2       -> http://chrisnewtn.com

The original links to a page are still shown in the graph in that page’s urlAliases collection, but as far as the rest of the graph is concerned, that link is now known by its resolved name.

Were URL shorteners and redirects ignored, you’d end up with a situation where both http://github.com/user and https://github.com/user were in your graph as two seperate pages, which is clearly incorrect.

Getting Started

Elsewhere requires Node.js to be installed first.

Clone the repo and start the server by running these commands in the terminal:

git clone git@github.com:dharmafly/elsewhere.git
cd elsewhere
npm install
bin/elsewhere

Now head to localhost:8888. You can test the API on this page by entering a URL into the ‘url’ box and clicking ‘Parse’. This will render the graph as a list on the page, complete with the names of each page of the graph and their respective favicons.

You can also test the API by simply appending the target URL to your address bar like so:

http://localhost:8888/?url=http://chrisnewtn.com

This will return a JSON version of the graph e.g.

{
  results: [
    {
      url: "http://chrisnewtn.com",
      title: "Chris Newton",
      favicon: "http://chrisnewtn.com/favicon.ico",
      outboundLinks: {
          verified: [ ... ],
          unverified: [ ]
      },
      inboundCount: {
        verified: 4,
        unverified: 0
      },
      verified: true,
      urlAliases: [
        "http://t.co/vV5BWNxil2"
      ]
    }
  ],
  warnings: [
    "http error: 404 (Not Found) - http://twitter.com/statuses/user_timeline/chrisnewtn.rss"
  ],
  query: "http://chrisnewtn.com",
  created: "2012-10-12T16:30:57.270Z",
  crawled: 9,
  verified: 9
}

The initial crawl will take a while, as each page needs to be visited, checked and cached. Once cached though, it should be pretty snappy.

See the API Reference for more details.

Changelog

  • v0.0.4
    • Changed error handing and surfaced warnings in the graph data. Various other changes.
  • v0.0.3
    • Replace JSDOM with Cheerio. Add domain limiter. Various other changes.
  • v0.0.2
    • Tweaks to the signature of the graph method. Various internal changes.
  • v0.0.1
    • First viable version of