1. OpenACS Title Slide 2. What is a Web Community? * Sites like slashdot.org, yahoo! groups, imdb.com, and even amazon.com (has community features), blogspot.com * Not web communities: most ecommerce sites, most sites advertising a company, service or band. * common features: bulletin boards (bboards), news, comments, user submitted stories * advantages: building a web community creates interest and publicity in a sideways manner. Site is useful besides advertising. Shares knowledge, reduces need for organization to produce all the content. * disadvantages: requires programmers and maintainers. Static sites can be run w/ almost no thought besides some basic UI design and the use of Dreamweaver etc. Needs of a web community: 1. magnet content authored by experts 2. means of collaboration (bboard, comments, etc) 3. powerful facilities for browsing and searching both magnet content and contributed content (site-wide search) 4. means of delegation of moderation (filters to block posters, content rating) #5. means of identifying members who are imposing an undue burden on # the community and ways of changing their behavior and/or # excluding them from the community without them realizing it (bozo-filter) 6. means of software extension by community members themselves (open source) 3. Who wrote OpenACS, who uses it, and why is it open source? * Started by arsDigita, later taken over by the OpenACS gang. Used by: * Development Gateway (WorldBank) www.developmentgateway.org - ACS * Knowledge Management System for Siemens Corporation. (intranet application) - OpenACS/ACS hybrid * Deutsche Bank Intranet - ACS * site59.com: Last Minute Travel site (www.site59.com) - ACS * scorecard.org: Environmental site which at one point served 30 db-backed page hits a second on an old Sun Pizza Box (Sun UltraSparc II) proto-ACS * photo.net: Community Site for camera enthusiasts serves hundreds of thousands of hits a day. (www.photo.net) - ACS * Software companies make most of their money via services not licenses. In the web world this is especially the case. Reduces development costs, gains free publicity, gains free bug fixes and packages. 4. History of OpenACS * Philip Greenspun, Ben Adida, and crew wrote a website for Hearst Publications in the mid-90s. * Used Illustra database, moved to ORACLE as ORACLE was a much better database. * Philip founded aD to build and market the ACS * in the process of building aD convinced AOL to open source AOLserver * PostgreSQL came out and was a full featured open source database * aD gets VC money * Ben Adida and some others started to port the ACS to PostgreSQL to make it built entirely on an open source platform. * aD decides to totally rewrite the ACS * 4.0 is released in mid-2000 * arsDigita decides to become more "market savvy" and move away from TCL to Java * VC appointed CEO starts to run company like a dot-com * Philip tries to take back over the company * OpenACS crew ports ACS 4.0 * VC's spend most of remaining capital paying off Philip * aD goes under and is bought out by RedHat * OpenACS is used by many small OSS companies to work on lots of projects, one of two or three major community systems (OSS). 6. OpenACS 3-tiered Architecture (Diagram) Browser <- -> WebServer <- -> Database (Data Model & Storage) viewer application logic **** Outline general use cases .. multiple users accessing website at same time. 6. What is a database? * Method of storing, organizing and rapidly retrieving data * Robust to multiple writes and reads at the same time * Through the '70s mostly hierarchical databases (file-system on steroids) * HDBS were not robust to changing data models * Born the relational database * Basically a bunch of spreadsheets (columns and rows) with a declarative language (SQL) used to retrieve the data 7. Responsibilities of Postgres + 8. PostGres vs. File System -- ACID Fundamentals * ACID section * efficient retrieval of data (Million row file, searching for one row, compounded when crossed w/ another million row file to coordinate the search) indexes * event listeners (triggers) * good system for coordinating data retrieval (joins) store information about the user in one table store information about the user's purchases in another table easily find out who bought pants on Oct-21st * more overhead on writes and reads maintaining indices etc. * embedded procedural language for performing common tasks inside the database. 9. Webserver Layer * what is a webserver * HTTP .. simple open protocol for Client-Server * anatomy of a standard page * some static, some dynamic, some database dynamic content. 10. AolServer vs. Apache * why aolserver was used instead of apache 12. TCL -> why TCL? * Toy language * weak on data structures (only the list and associative array) * not buzzword compliant * weaker on heavy infrastructure if not used carefully * slow * turing complete * satisfies 90% of website's needs (Vignette storyserver uses it too and they charge 10's of thousands of $'s -- used to) * rapid development .. can develop sites in much less time then Java * on the web everything is a string .. but your fundamental data isn't 13. AOLServer Native Services 1. Database API & pooling * ns_db api * pooling vs. new connections * no database swamping 2. Filters * violates one URL - one file * can be used for authorization or redirection * invisible to developers so can stack 3 million of them slowing requests and not realize it. 3. Templating * ADP's to mix TCL and HTML code * scares HTML-monkeys 4. Connection API * Unified way to get basic information about requests and the client. Only based on client .. not on information special to the system. ns_conn url 14. Why they are insufficient? 15. 3.x vs. 4.x * Flat structure (use examples from documents) * good for single look feel websites, monolithic structure * everything installed in one batch .. * services tended not to be autonomous .. * pile of code .. not well designed vs. * strong on infrastructure * packages allow real separation of functionality, tendency to design more reusable components * didn't have to install everything * good for monolithic and multi-purpose deployment Sections: Database Services 1. Data Model (Compared to Vignette) * Vignette had some basic utilities and a v. basic data model which was insufficient for building a Vignette site. You ended up having to write a lot of your data model while building on it. * OACS has strong data model for site-wide services. Data modelling is a major portion of site-design. Data model tested in a wide variety of situations so it tends to be pretty robust. * Data model is easily extensible .. the integration w/ the database is tight so it is easy to optimize. (See database independence) 2. Database API (modifications, example of advantage of TCL .. show Java code) * db_1row 3. Basic Object Interface * All things which require site-wide services are an extension of ACS_OBJECTS * 4. Database Functions 5. "Database Independence" 6. XQL Website Structure 0. 1 URL = 1 File 1. Packages (directory structure slide) 2. Package Instances 3. Site Map & Site Nodes Request Processor 0. Why it exists 1. Anatomy of a request 2. How it handles a request 3. Templating (SLIDE?) 3a. ad_page_contract, adp's vs. html::template * looping * conditional logic (if-then) * includes * reverse-includes (master) 4. Subsites Permissions: 1. Problem defined 1aa. users, objects, privileges 1ab. Users and Persons 3. users, parties, groups 4. contexts 5. API .. what it gives you 6. Utter Failure * doesn't scale * doesn't meet needs Security 1. Basic Problem / Security Scenarios 1. Packet Sniffer 2. Left computer on (browser history, showing on screen, etc.) 3. Hacker/Defecting DB Admin 2. HTTP vs. HTTPS 2a. ad_secure_conn_p 2b. HTTP authorization code is insecure 3. Passwords, emails, one-way hashes 4. Authorization/Authentication 5. How to steal an identity 6. Always check your passwords 7. Don't store data 8. 2 signs that a website should not be trusted .. Self-Documenting Server: 1. ad_proc, ad_library, ad_whatever ad_proc -flags (which are pretty meaningless last time i checked) { args } { javadoc style @info } { code .... } ad_library { javadoc style @info } stores data in memory array and you can read the documentation through the /doc interface. A Typical ACS Page: 1. Database hits 2. ad_page_contract 3. template Cache (Poorly Done problem): 1. Memory Caching * It's fast, w/ AOLServer it is easy to share information between threads since there share a memory address space. * Causes memory usage to increase, if caches are commonly used and never purged they may result in RAM being used up and then going to SWAP space which slows down every action on the system. * In a multiple front end server environment there may be cache inconsistency. There is no efficient mechanism to update the caches on each of the servers. Someone may reload the same page 4 times and see 4 different results. * Cache does not persist between server reboots (depending on stability of system this may not be a major concern but wait until you are slashdotted). 2. Database Caching * Works between multiple front ends. * Consistent between reboots. * More expensive to write and read. * With a massive # of front ends with replicated databases you will have cache inconsistency again. 3. Squid Caching * Great for mostly static content * SQUID can act as a proxy/load balancer and can cache oft requested pages which don't change in memory and not even forward the requests to the webservers. * Tiny variations like, "Welcome Tristan" instead of "Welcome Armen" can stop the page from being cached. 4. Amazon/Google-Style redirect caching * Probably the best solution in massive deployments. * A user is redirected to the same server over and over again. * Google has special indexes based on search terms so you are always directed to a machine which is specially tuned for your search criterion. * All the advantages of memory caching and squid caching without the problems of cache inconsistency. * Resolve memory leaks by having the cache flush old unused data. 5. Since OpenACS was designed for deployment with one to a couple of front ends in mind it focuses on memory caching. util_memoize stores data in a set of key value pairs with a timestamp. the oldest data is flushed as memory usage grows above a certain amount. Database caching is easy to implement. 5. OpenACS vs. Zope vs. Roll Your Own * OACS - Tightly integrated w/ the database * Zope - Uses custom object database for many parts, can also run on top of a RDBMS. -- * OACS - standard site-wide method of handling users, permissions, site-wide search, templating, packaging, site-maps * Zope - ditto .. may run into trouble when concepts need to exist in two places .. like users. -- * OACS - most work done in editor of choice on top of OS of choice * Zope - lots of work done in browser interface .. -- * OACS - non-simplistic install, highly customizable * Zope - easy to install, less obviously customizable -- * OACS - depending on level of customization upgrading may be painful if it involves changes to the database * Zope - Probably easier to upgrade -- * OACS - TCL, weak on data structures, simple to learn and implement in, lots of custom constructs inside of OACS designed to accelerate development. * Zope - Python, strong on data structures, excellent language .. DTML, semi-programming language w/ HTML-like syntax Python is famous for being a compact and simple language, in it's documentation Zope proudly (and prob. incorrectly) indicates that it ignores the benefits of Python.