Martin Kováčik
<< Back to Bachelor project contents

Links to internet resources

Wiki Vandalism issues

  1. Anti-vandalism Ideas: http://meta.wikimedia.org/wiki/Anti-vandalism_ideas
  2. Delayed commits: http://wikifeatures.wiki.taoriver.net/moin.cgi/DelayedCommits
  3. Reputation Management: http://en.wikipedia.org/wiki/Reputation_management
  4. Favorit pages of banned users: http://en.wikipedia.org/wiki/Wikipedia:Favorite_pages_of_banned_users
  5. Automatic edit war squashing: http://meta.wikimedia.org/wiki/Automatic_edit_war_squashing
  6. Wiki Spam: http://meta.wikimedia.org/wiki/Wiki_Spam

Wiki Article Validation

  1. http://en.wikipedia.org/wiki/Wikipedia:What_is_a_featured_article
  2. Article validation: http://meta.wikimedia.org/wiki/Article_validation
  3. Article validation possible problems: http://meta.wikimedia.org/wiki/Article_validation_possible_problems
  4. Article validation feature: http://meta.wikimedia.org/wiki/Article_validation_feature
  5. Article validation proposal: http://meta.wikimedia.org/wiki/Article_validation_proposals

Proposals and projects to use for wiki protection

  1. CAPTCHA: www.captcha.net
  2. jCaptcha: http://jcaptcha.sourceforge.net/
  3. jCaptcha J2EE integration: http://www.javaworld.com/javaworld/jw-03-2005/jw-0307-captcha.html

Keyword extraction

Keyword Extraction Using Naive Bayes

URL: http://www.cs.bilkent.edu.tr/~guvenir/courses/cs550/Workshop/Yasin_Uzun.pdf [PDF]

As the internet grows, amount of electronic text increases rapidly. This brings the advantage of reaching the information sources in a cheap and quick way. Keywords are useful tools as they give the shortest summary of the document. But they are rarely included in the texts. There are proposed methods for automated keyword extraction. This paper also introduces such a method, which identifies the keywords with their frequencies and positions in the training set. It uses Na.ve Bayesian Classifier with supervised learning.


Using Keyword Extraction for Web Site Clustering

URL: http://tcc.itc.it/people/pianta/publications/wse2003clustKeywords.pdf [PDF]

Reverse engineering techniques have the potential to support Web site understanding, by providing views that show the organization of a site and its navigational structure. However, representing each Web page as a node in the diagrams that are recovered from the source code of a Web site leads often to huge and unreadable graphs. Moreover, since the level of connectivity is typically high, the edges in such graphs make the overall result still less usable. Clustering can be used to produce cohesive groups of pages that are displayed as a single node in reverse engineered diagrams. In this paper, we propose a clustering method based on the automatic extraction of the keywords of a Web page. The presence of common keywords is exploited to decide when it is appropriate to group pages together. A second usage of the keywords is in the automatic labeling of the recovered clusters of pages.

Spam Protection and Recognition

jASEN - java Anti Spam ENgine

URL: http://www.jasen.org [Web]

Library for Java.


SpamAssassin

URL: http://spamassassin.apache.org/ [Web]

Plugin for mail servers.


DansGuardian

URL: http://www.dansguardian.org/ [Web]

Web Content-filter.

Wiki related resources

Radeox Wiki Render Engine

URL: http://www.radeox.org [Web]

Radeox RE is a Wiki rendering engine implementation that implements the Render Engine API (REA).