Distributed Information Systems

Winter Semester 2002-2003, SSC, mandatory, orientation IS, Sem 9 SI optional
Lecture Team
Prof. Karl ABERER (Lectures) karl.aberer @ epfl.ch +41-21-693 4679 PSE A 1.32
  Philippe Cudré-Mauroux (Exercises) philippe.cudre-mauroux @ epfl.ch +41-21-693-6787 PSE A 1.51
Anwitaman Datta (Exercises) anwitaman.datta @ epfl.ch +41-21-693 6615 PSE A 1.62
Pre-exam office hours
10.02 2-4 Philippe (PSE-A 1.51)
13.02 2-4 Philippe (PSE-A 1.51)
16.02 2-4 Anwitaman (PSE-A 1.62)
18.02 10-12 Anwitaman (PSE-A 1.62)
Sample exam
Sample exam [Please note that we will NOT provide any solution for this sample exam.]
Time and Place

Lecture: Tuesday 8-10 Room INM200
Exercise: Tuesday 10-11 Room INM 200
Office hours:
Wednesday 14:00-15:00 (Philippe, Room PSE A1.51)
Friday 14:00-15:00 (Anwitaman, Room PSE A1.62)

Description
This course introduces in detail several key technologies underlying today's distributed information systems. After introducing nonstandard data models nowadays in use on the Web for information representation, we learn about various aspects of processing this information at increasing levels of abstraction, starting from the physical aspects of managing distributed data up to the extraction of new information from existing data by means of data mining. The specific focus will be on managing Web and mobile data.
Prerequisites
We assume students to be familiar with the course relational databases.
Final Exam
There will be a written exam. The exam will consist of conceptual question similar to those posed throughout the lecture and of examples similar to those from the exercises.
Exercises
There will be weekly exercises with practical instances of the methods introduced in the courses. The exercises will be graded and will contribute up to 20% (2% per exercise) to the final grade (bonuses only, i.e. not taken into account if the exam grade is higher than the exercise grade).
Lecture Schedule
Date Slides Homework Q&A's
Introduction
21.10.03 Introduction to DIS (pdf) (pdf)/ XML Intro (pdf)
Semi-structured Data
28.10.03 XML Storage and Filtering (pdf) (Exercise) (Solution)
04.11.03 Graph databases (pdf) (Exercise) (Solution)
11.11.03 RDF and Semantic Web/OIL (pdf) (Exercise) (Solution)
Distributed Data Management
18.11.03 Schema Fragmentation (pdf) (pdf - 2nd part) (Exercise) (Solution)
25.12.03 Mobile Data Management (pdf) (Exercise) (Solution)
02.12.03 P2P Systems I (pdf) (Exercise) (Solution)
09.12.03 P2P Systmes II (pdf) (Exercise, Network) (Solution)
Information Retrieval and Data Mining
06.01.04 Special Session (Semantic gossiping) (Exercise)
13.01.03 Vector space retrieval (pdf) (Exercise)(Mathematica file) (Solution)
20.01.04 Advanced Retrieval (Exercise) (Solution)
27.01.04 Association Rule Mining (pdf) (pdf) (Exercise) (Solution)
03.02.04
Literature
Books
M. Tamer Özsu, Patrick Valduriez: Principles of Distributed Database Systems, Second Edition, Prentice Hall, ISBN 0-13-659707-6, 1999.
S. Abiteboul, P. Bunemann, D. Suciu: Data on the Web: From Relations to Semistructured Data and XML, Morgan Kaufman, 2000.
Ricardo Baeza-Yates, Berthier Ribeiro-Neto, Modern Information Retrieval (Acm Press Series), Addison Wesley, 1999.
Jiawei Han, Data Mining: concepts and techniques, Morgan Kaufman, 2000, ISBN 1-55860-489-8
P. Baldi, P. Frasconi, P. Smyth: Modeling the Internet and the Web, Wiley 2003.
Papers
Daniel Barbará: Mobile Computing and Databases - A Survey. TKDE 11(1): 108-117 (1999)
Swarup Acharya, Rafael Alonso, Michael J. Franklin, Stanley B. Zdonik: Broadcast Disks: Data Management for Asymmetric Communications Environments. SIGMOD Conference 1995: 199-210
Sohail Hameed, Nitin H. Vaidya: Log-Time Algorithms for Scheduling Single and Multiple Channel Data Broadcast. MOBICOM 1997: 90-99
Tomasz Imielinski, S. Viswanathan, B. R. Badrinath: Data on Air: Organization and Access. TKDE 9(3): 353-372 (1997)
Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, Hari Balakrishnan. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. Proceedings of the ACM SIGCOMM, 2001.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker. A Scalable Content-Addressable Network. Proceedings of the ACM SIGCOMM, 2001.
M.A. Jovanovic, F.S. Annexstein, and K.A.Berman. Scalability Issues in Large Peer-to-Peer Networks - A Case Study of Gnutella. University of Cincinnati, Laboratory for Networks and Applied Graph Theory, 2001. http://www.ececs.uc.edu/~mjovanov/Research/paper.ps
Frank Dabek, Emma Brunskill, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan. Building Peer-to-Peer Systems With Chord, a Distributed Lookup Service. Proceedings of the 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII), 2001. http://www.pdos.lcs.mit.edu/papers/chord:hotos01/hotos8.pdf
Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System. Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability. LLNCS 2009. Springer Verlag 2001. http://www.freenetproject.org/index.php?page=icsi-revised
Karl Aberer. P-Grid:A self-organizing access structure for P2P information systems. Proceedings of the Sixth International Conference on Cooperative Information Systems (CoopIS 2001), 2001. http://lsirwww.epfl.ch/publications/tr/TR2001-016.pdf
MICHAEL W. BERRY, SUSAN T. DUMAIS, GAVIN W. O'BRIEN. USING LINEAR ALGEBRA FOR INTELLIGENT INFORMATION RETRIEVAL. Department of Computer Science, University of Tennessee, Knoxville, Dec. 1994.
Gio Wiederhold. Mediators in the architecture  of future information systems. IEEE Computer Magazine, March 1992.
L.Liu, L.L.Yan, M.T.Ozsu. Interoperability in large scale distributed information delivery systems.
 000 n