bias goggles is a rapidly evolving prototype system that enables users to explore the bias characteristics of 2LD / 3LD web domains for a specific user-defined concept (i.e. a bias goggle). Since there is no objective definition about what bias and biased concepts are, users are able to define their own. For these concepts, the approach computes the support and the bias score of a web domain, by considering the support of this domain for each aspect of the biased concept. These support scores are currently computed by graph-based algorithms that exploit the structure of the web graph and a set of user-defined seeds for each aspect of bias.
Currently, we are developing scalable indexes for supporting larger crawled subsets of the web. We are also investigating content-based approaches for computing the bias scores, along with refinements of our graph-based approaches. Finally, we are developing cross browser plugins that will be provided for free, so that everyday users can inspect and monitor the content they consume over time.
The system in its present form provides 2 predefined bias goggles. Each bias goggle consists of a set of sets of seeds (in our case singleton sets), that describe the different aspects of bias. In the near future, when we release the browser plugins, users will be able to submit their own seeds and define their own specific bias goggles.
The two available bias goggles are:
Our current dataset is a subset of the greek web. In total we have downloaded 893,095 pages including 531,296,739 links, which correspond to a graph of 90,419 domains with 288,740 links between the domains, and a graph diameter of 7,944 nodes. We have made a number of experiments to check the validity of our algorithms using a golden collection of biased domains, and the results of our work were accepted and presented in the ECIR 2020 conference. We are working hard to expand our crawled data and provide new datasets for evaluation.
The domains' graph and the golden collection can be downloaded from the links below:
The support score for each aspect of bias of any domain is currently computed using graph-based algorithms. Specifically we exploit the well-known Independence Cascade (IC) and Linear Threshold (LT) propagation models. In addition we have introduced a variation of the PageRank algorithm, named Biased PageRank, that models various behaviours of biased surfers. More information about the algorithms is given in the ECIR 2020 paper.
Panagiotis Papadakos email: papadako at ics dot forth seconddot gr