Archive for November, 2008
PaperCube
For the last 3 months, I’ve been working hard on my Master’s thesis project at Santa Clara University, code-named PaperCube. It is a rich internet application using SproutCore, SVG, and the canvas tag. For even more interactivity, I use Webkit CSS transitions for animation. No Flash, Flex, or Silverlight.
The name PaperCube may sound a bit odd. If you’re familiar with my research topic, the first part makes sense since I’m researching ways to more effectively search digital libraries with a focus on academic paper network navigation. The second part isn’t as clear, even to me. I want to show many different dimensions of the data and that’s why I thought of a cube. I’m finally to a point where it is useable and relatively stable. Therefore, it has escaped my laptop to be hosted on an actual web server.
Try PaperCube (opens in a new window)
What data can you search and view?
PaperCube now allows you to search papers in the CiteSeer digital library. The data that you can browse using PaperCube contains papers, references, authors, and affiliations; all elements of a typical digital library. There are a lot of digital libraries online such as the ACM Digital Library and IEEE Xplore. However, they are not available unless you’re a subscriber to the service and there are many intellectual property concerns.
Fortunately, Penn State University maintains CiteSeer. CiteSeer is an open digital library containing more than 1 million papers. The data set is available to download in XML format. Unfortunately, the downloadable data set is from. Some issues that are causing problems is that papers do not have all their references in the data set, the publication date of papers is incorrect, and authors are not uniquely identified. It is still a very good data set and it is quite useable, but it isn’t perfect. However, there is hope for perfection because there is a new version called CiteSeerX that I really hope to be able to use eventually if it is made available for download.
Searching

search interface
A walk along the tool bar
PaperCube keeps your browsing history as you browse and search papers. It works behaves in the same way as your browser history. If you press down on the button for more than half a second, you’ll see a menu list of your visited papers that you can select from. Also, when you switch views, a history item will be added.
Next, there is the “Mode” drop down menu. You can either view papers or authors. Further to the right, there is the “View” menu. When you switch the “Mode”, the “View” menu will change accordingly. The “View” menu allows you to select one of the many views of a paper. I’ll describe each one in a bit more detail later.
Moving further to the right, next there is the “Direction” menu. You can either view references or citations. The distinction may seem confusing. I want to be able to show relationships of papers in two directions. Thus, in my language, paper A references paper B, and paper B is cited by A. Also, some views will also have a menu for “Max Iterations”. Some of the views recursively explore the citation network of a paper and there are cases when you want to limit the amount of papers retrieved. I’ve gone as far as retrieving 37,000 papers. Imagine showing that on your screen!
Finally, on the right side of the screen there is a slider. This slider allows you to scale the visualization. All visualizations have different levels of zoom. For instance, Papers Per Year view allows you to zoom in 20 times. When you zoom in, you will see a small preview window that you can move around the visible area. You can also scroll zoom using your mouse. The zooming capability was basically the first thing that I worked on when I started because I know that being resolution independent is very important in some views. If you can infinitely reveal more information as you zoom in further, the possibilities are endless.

history menu

view menu

zoom menu
The real reason for PaperCube: the views
All the views are interconnected with each other. If you change the “View” in the tool bar, you will change the view, but not what paper is being shown, an advantage of using a MVC framework like SproutCore. Some of the views use a “fan” menu to aid navigation. When you click on an element on the screen, the menu will be revealed where you can perform several different actions including zooming, navigating to other papers, and loading the paper on CiteSeer’s web site.
Detail View
First, there is “Detail”, this is more of a traditional web-based digital library view of a paper. If you mouse-over any of the references or citations, you will see more details about that paper. The reference/citation graph shows the frequency of papers per year.

details of a paper

showing citation additional info
Circle View
Second, there is “Circle View”. A simple version of Circle View was originally developed in 2004 as part of my undergraduate thesis at UC Santa Cruz. This visualization shows paper and two levels of references or citations. The color of the lines signifies the number of citations that a paper has. The theory is that the number of times a paper is cited can imply importance. One neat feature of Circle View is that papers can be displayed more than once. For example, if there is a paper that is referenced 3 times by papers in the visible area, the paper will be highlighted in all three places allowing you to quickly see if it may be important. This actually has some cognitive psychology theory behind it using the feature-present/feature-absent effect. Things that stand out are very easy to count. This is used in other visualizations as well.

circle view showing meta data

zoomed in

showing the fan menu
Tree Map
Third, there is the “Tree Map” view. It is not necessarily a traditional tree map that you see for things such as showing disk usage. This visualization shows levels of citations or references. In the case of references, the focused paper is on top. For citation, the focused paper is shown on the bottom. Basically, each level of the map shows one more reference hop away from the focused paper. A paper’s references or citations is contain within the width of it’s parent. This means that at the end, the papers will become thinner and thinner. However, you can reveal more information by zooming in. A red box in the tree means that the width of the references is smaller than 0.1 pixels and it’s not worth showing. By zooming in, you will be able to see that information and it will be requested from the server as needed.

tree map view

zoomed in to reveal more papers
Papers Per Year
Last, there is “Per Year” view. This is kind of similar to the tree map but papers are ordered by year. The focused paper is shown on the top separately and it’s references are interspersed throughout the years. You can specify the number of recursions to perform. The more you do, the more complete your picture of the paper’s citation network will be, but it can be dangerous for your browser’s health because there is the potential to retrieve a lot of data. When you mouse over a paper with references or citations that are downloaded (You can download more by upping the levels), you will see lines linking other papers. A blue line means that it is a reference. A red line means that it is a citation. You can pin the lines by clicking on a paper and using the menu.

papers per year view

showing both ref and cites

papers per year view zoomed in
Conclusion
There’s still a lot of work to do. I am going to implement the author component of the application next. There are a lot of other visualizations that I want to explore if I have time for both papers and authors. PaperCube works best in Webkit. Webkit with SquirrelFish and CSS transitions makes PaperCube really rock. However, using Firefox 3 works just as well. Internet Explorer is just too slow to do any of the things that I’m doing. Also, I use SVG and the canvas tag, both of which are not natively supported on IE. I use SproutCore as the application framework to enable a lot of the rich application interaction.
If you have any comments, suggestions for new views or features, questions, and especially bugs to report, please don’t hesitate to email me at pbergstr at me dot com or hit me up on the contact page.
Comments are off for this post