Demo: Filtering Wikipedia articles with LineRate Proxy
Demo of LineRate Proxy and the xml2js Node.js module
The demo below was created by one of the LineRate developers on a Friday hack-a-thon day. The goal for the day was to create fun demos of some Node.js modules using LineRate Proxy. This demo shows the power of using LineRate Proxy to intercept and filter or redirect HTTP requests based on the Proxy requesting additional XML data from an off-box service.
For the full description and actual code to implement this cool demo, see the full posting on Github.
PurposeWikipedia articles are publicly edited. Some edits are associated with particular usernames. However, anyone on the Internet can click "edit" and make a change; those edits are "by IP". Often, "by IP" edits are advertising robots, vandals, or people/organizations with a point-of-view to inject.
When a user goes through a proxy running this script, they see the latest revision of an article that was edited by a user with a username. This may be a better revision, because that edit was less likely to be from a robot or vandal or POV-pusher.
In reality, this demo is intended to exhibit the LineRate Proxy Scripting Engine.
How it worksA forward-proxy object is configured through the normal means (CLI, REST JSON API, or web GUI). A range virtual-ip is created that listens for connections that are intended for the wikipedia servers. The script is attached to that forward proxy. The LineRate Proxy system is then placed into the network so that user requests are proxied through it.
When a user browses to a webpage, such as:
the browser does DNS resolution as normal. Then, it submits an HTTP request to the resolved address. Since the range virtual-ip is listening at that address, the forward-proxy receives the request. The script is invoked, and gets to choose how to handle it.
However, for the main article request, the scripting engine holds the request, and makes a new HTTP request to get the history of the article, for instance:
If the first author is by a registered user, then the next() call for the original user request is invoked and the request is passed through to the Wikipedia servers, and the response returned to the user.
If the first author is not a registered user, then the script walks backward in the revision history until it finds a revision that was made by a registered user. Then, it writes a response back to the user that is a temporary redirect to that version of the page, like:
HTTP/1.1 302 Found Location: http://en.wikipedia.org/w/index.php?title=Chunked_transfer_encoding?oldid=563242545 Content-Length: xxx <html><head><title>Redirecting Chunked_transfer_encoding</title></head> <body> <h1>Redirecting to a human-edited version</h1> <p>The last version of Chunked_transfer_encoding was edited by an anonymous user.</p> <p>Redirecting to the last human-edited version: <a href="...">...</a></p> </body></html>