http://philip.greenspun.com/panda/ Guide to Web Publishing History of OpenACS: http://openacs.org/about/history (official history) * Originally was the ACS * arsDigita founded by an MIT professor was a company which aimed to create an open source web community toolkit. * business model was to build online services using this toolkit. * Development Gateway (WorldBank) www.developmentgateway.org - ACS * Knowledge Management System for Siemens Corporation. (intranet application) - OpenACS/ACS hybrid * Deutsche Bank Intranet - ACS * site59.com: Last Minute Travel site (www.site59.com) - ACS * scorecard.org: Environmental site which at one point served 30 db-backed page hits a day on an old Sun Pizza Box (Sun UltraSparc II) proto-ACS * photo.net: Community Site for camera enthusiasts serves hundreds of thousands of hits a day. (www.photo.net) - ACS * architecture used ORACLE * it was proprietary so it was discouraging for OSS enthusiasts * ORACLE had free licensing for non-profit projects * at the time of development ORACLE was the only decent *ACID ref*ACID database on the market (besides the even more expensive DB2). There were no acceptable OSS alternatives. * Until PostGres 7.0 ACID was not available on an OSS DB. With Postgres 7.0 ORACLE was no longer necessary * brief software history * originally a group of utilities written for Hearst Publishing. * eventually a set of services came out of the system including bboard, classified, and neighbor-to-neighbor * as a project which evolved w/ little to no planning it had next to no software architecture. It was just a "pile of code" without any sense of standards or large-scale design. Lots of replication, defied any modern sense of an engineering standard.q * OpenACS was just a port of the pile of code originally * ACS 4.0 was an attempt to build a system with more integration and services .. it would have a concept of a kernel and core services on which to build applications. * Current version of OpenACS 4.x is a port of this code and it improves on it since arsDigita later abandoned TCL ACS. * OpenACS was started as a project on SourceForge in Dec. 1999 to port ACS to Postgres (Originally the project was called ACS/pg). At this time I was at home doing the problem-sets *psets-link* so that I could work at aD. * arsDigita eventually moved to a Java platform and was bought by RedHat. The new RedHat CCM *link* is just the ACS-Java rebranded. Basic Architecture: AOLserver -- ns_db API (custom postgres driver) -- PGsql | | TCL Libraries Tables/Data TCL Scripts ADP templates Images Static Files * AOLServer * Used by AOL to serve all of it's webcontent. Developed back when Netscape was version 0.9 as NaviServer. Later was opensourced as AOLServer. (if you use it lots of items refer to nsd or NaviServer daemon .. just a historical detail) * OSS multi-threaded webserver written in C (and extensible in C) * tightly integrated with TCL (a lightweight interpreted scripting language) * well defined database API which utilized database pooling * database pooling is important because the cost of opening new connections to a database is reduced to near zero as the connections share the database connections. * database pooling relies on the multi-threaded nature of the webserver. * Apache, the #1 webserver in the world (also OSS) has recently moved from a multi-process server to a multi-threaded server with Apache 2.x. Apache is not tightly integrated w/ anything .. which has it's advantages and disadvantages. * Threading is important because the cost of starting and maintaining threads is much lower then processes. It also allows easy sharing of data between threads instead of needing to use IPC-based methods. * RDBMS: * Relational DB's are best conceived as a set of related spreadsheets. They are composed of columns representing certain types of data and rows of data which meet the columns requirements. * Most modern relational databases are filled and accessed via a declarative language called SQL. (Declarative languages are ones where you state the information you want and the system figures out how to retrieve the information .. you don't have to know anything about storage. Another example is XSLT). A SQL standard is released every so often and most databases ignore part or most of the standard. ORACLE implements outer-joins using a non-standard syntax, PostGres just recently implemented outer-joins at all, and MySQL is missing so many features that MySQL and SQL-92 compliant shouldn't be mentioned in the same breath (although it is improving by leaps and bounds). * Anatomy of SQL: http://philip.greenspun.com/sql is how I learned SQL. Visual tools like those found in ACCESS attempt to hide the complexity of SQL from end users. Take a little while to learn SQL and you'll be infinitely better off then suffering through those cumbersome tools (which prove inadequate when writing a 250 line query). * The most commonly deployed RDBMS is MySQL, for several reasons: 1. It is OSS and totally free 2. It is possibly the fastest of the major databases when it comes to read. In order to accomplish this feat it has problems with large quantities of writes, and it is still working on becoming ACID compliant. IMHO it is the best read-only database, but I would never use it for mission critical applications. * I've mentioned ACID compliance twice, what is ACID compliance and why should anyone care? Atomicity Results of a transaction's execution are either all committed or all rolled back. All changes take effect, or none do. That means, for Joe User's money transfer, that both his savings and checking balances are adjusted or neither are. Consistency The database is transformed from one valid state to another valid state. This defines a transaction as legal only if it obeys user-defined integrity constraints. Illegal transactions aren't allowed and, if an integrity constraint can't be satisfied then the transaction is rolled back. For example, suppose that you define a rule that, after a transfer of more than $10,000 out of the country, a row is added to an audit table so that you can prepare a legally required report for the IRS. Perhaps for performance reasons that audit table is stored on a separate disk from the rest of the database. If the audit table's disk is off-line and can't be written, the transaction is aborted. Isolation The results of a transaction are invisible to other transactions until the transaction is complete. For example, if you are running an accounting report at the same time that Joe is transferring money, the accounting report program will either see the balances before Joe transferred the money or after, but never the intermediate state where checking has been credited but savings not yet debited. Durability Once committed (completed), the results of a transaction are permanent and survive future system and media failures. If the airline reservation system computer gives you seat 22A and crashes a millisecond later, it won't have forgotten that you are sitting in 22A and also give it to someone else. Furthermore, if a programmer spills coffee into a disk drive, it will be possible to install a new disk and recover the transactions up to the coffee spill, showing that you had seat 22A. * pilfered from http://philip.greenspun.com/panda/databases-choosing Maybe I haven't convinced you that an ACID compliant database is important. After all major software companies use MySQL all the time. You're just going to have to accept that I am right and they are wrong. Once you've worked with a fully ACID compliant database it becomes nervewracking to work on a non-ACID dbms. * Locking: Databases as feature-light as MySQL and as expensive as MS SQLServer use a "write-unfriendly" technique known as "table-level locking". When a user is attempting to alter the data on one row of a table the whole table becomes locked to all other db-users. This is fine when you have 1 writer, but when you are trying to handle 20 concurrent writes you will inevitably end up with a queue of people waiting to write. To make matters worse people who want to read are waiting for the writers. PostGreSQL and ORACLE use row-level locking and only lock the rows to other writers thereby better supporting high numbers of concurrent readers and writers. * Indexes: Any large database requires careful management of indices. Usually when trying to run a query the database's query planner will figure out how to best use existing indices to accelerate your query. If you run a query against a 20 million row table and one of your join criteria runs against data not in an index you may be in for a long wait. If you add an appropriate index (appropriate being a key word) you can cut a query which takes 30 minutes down to one which takes 2 seconds. -- discuss the dangers of over/improper indexing? -- discuss how queries are executed (parse, analyze, plan, implement, return cursor)? * Triggers: I mentioned triggers earlier. These are event listeners which fire when a certain event occurs. Triggers are used to maintain auditting tables, data integrity (if a row is added or removed from one table you may want to seemlessly add or remove rows from another table), and a variety of other things. -- discuss the dangers of triggers? PostGres: * http://developer.postgresql.org/docs/postgres/history.html Postgres was originally developed as a project in the University of California, Berkeley in 1986. As the project grew in scope it was used by a jet engine performance package, an asteroid tracking db, and many other projects. By 1993, with a solid core of developers (outside the school) and with mounting maintenance issues they decided to end the school's work with the project. In 1994 a SQL interpreter was added to PostGres and released to the web as PostGres95. To prevent the need to rename the project every year the project was renamed in 1996 to PostgreSQL. Since then they have continuously added new features to the product making it the most ANSI compliant of all the OSS databases. * PostgreSQL issues/capabilities. ORACLE allows you to write stored procedures in a "Procedural" version of SQL called (appropriately) PL/SQL. PostgreSQL does ORACLE one better allowing you to write stored procedures in most of the major programming languages (C, C++, PERL, Python, TCL, Java). The major RDBMS vendors (meaning ORACLE and IBM/DB2) products have several advantages over PostgreSQL. The three biggest ones are that they are better with enormous data sets and data flows. Their other advantage is with clustering/distribution/mirroring. ORACLE has a setup to optimize data security as well as accelerating writes and reads. It only requires 44 SCSI disks. It distributes it's many different functions over these 22 disks and then mirrors each of them. With these sorts of optimizations they should open source their database and go into the hardware business. -- Should probably restructure this ACS 4.x configuration AOLServer * ${NSHOME}/your-server.tcl most acs configurations set NSHOME=/home/aolserver but I prefer NSHOME=/etc/aolserver when you start aolserver you the command is aolserver -u (user) -g (group) -it your-server.tcl -i = installed mode -- logging is done in your defined log file -f = foreground mode -- piped to stdout -t = you are using the new-school .tcl files. They used to have a .ini structure which sucked and was replaced by the excellent .tcl method. * ${ACS_LOG_DIR}/access.log and error.log a standard aolserver/acs install will create two logs. 1) an access log that stores the individual requests sorted by date. 192.168.125.30 - - [15/Apr/2003:18:28:15 +0500] "GET /site-info?email=tristancohen%40yah oo%2ecom HTTP/1.1" 200 0 "http://192.168.124.8/create-administrator" "Mozilla/5.0 (Macin tosh; U; PPC Mac OS X Mach-O; en-US; rv:1.0.1) Gecko/20030306 Camino/0.7" is a standard access log entry. 192.168.125.30 :: Requesting IP [15/Apr/2003:18:28:15 +0500] :: Date (duh) "GET /site-info?email=tristancohen%40yahoo%2ecom HTTP/1.1" :: Abbreviated HTTP request for a great reference on HTTP protocol check out: http://www.w3.org/Protocols/ (if you are a masochist) or http://www.jmarshall.com/easy/http/ (I haven't read it yet .. but it seems good) 200 :: HTTP response code 0 :: I don't remember (doh) "http://192.168.124.8/create-administrator" :: referring URL (sent by user's browser "Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.0.1) Gecko/20030306 Camino/0.7" :: Information on the requesting client (sent by user's browser. * ${NSHOME}/modules/tcl :: Globally deployed TCL modules (part of the standard aolserver release). Includes the http tcl module which is very handy if you want to have your server do HTTP POST operations (these are not available through the standard aolserver TCL API) * ${ACS_HOME}/your-server the standard acs install puts all your acs files under /web/your-server . If you are more into the standard debian setup you can put it under /var/aolserver/your-server . ACS Directory Structure: * ./www :: The directory where your standard ACS pages are located. During the old 3.x days your packages and their pages were under this directory. With the advent of the new packaging system only a few very standard pages are under this directory. Most are now located under the pacakges directory. * ./tcl :: This directory stores all the bootstrapping libraries for the system. During the 3.x days all the libraries were placed under this directory. * ./packages :: This is the maine directory which stores all the apm's (ACS Packages). Their format mirrors the base server format. * ./package-name :: the subdirectory for a particular package. * ./www :: The pages for the package are located under this directory. * ./tcl :: The libraries to be loaded at startup are under this directory. * ./sql :: The data model for this package is located here. * ./oracle :: Under this subdirectory is the data model as set up for ORACLE. * ./postgres :: This subdir houses the dm for postgresql. * ./package-name.info :: XML file describing the required dependencies and the files w/in the package. ACS File Types: * .sql :: The SQL scripts used for package installation. * /tcl/*.tcl :: These files (sourced in "ls" order) contain the libraries loaded at system startup time. * /www/*.tcl :: These files are the scripts which assemble the data for a template. * /www/*.adp :: These are the standard templates for any individual package. * /(tcl|www)/*.xql :: The OpenACS people have decided to try and keep ORACLE support for their software through a "crude" method of database abstraction. Anatomy: /tcl/*.tcl Excerpt from ${ACSHOME}/packages/acs-tcl/tcl/object-procs.tcl ad_library { Object support for ACS. @author Jon Salz (jsalz@arsdigita.com) @creation-date 11 Aug 2000 @cvs-id $Id: object-procs.tcl,v 1.2 2002/09/10 22:22:14 jeffd Exp $ } ad_library stores some basic text in a javadoc-esque format. When the library is loaded, and if documentation creation is enabled, then the block of text in {}'s is written into a documentation file. Similarly ad_proc documents each of the proc's inside the libraries. ad_proc -private acs_lookup_magic_object { name } { Returns the object ID of a magic object (performing no memoization). } { return [db_string magic_object_select { select object_id from acs_magic_objects where name = :name }] } ad_proc is a replacement for the TCL standard proc. It allows you to set the public or privateness of the procedure. db_string uses the ACS db_api discussed later. There are many other procs inside this file. Excerpt from ${ACSHOME}/packages/lars-blogger/www/index.tcl Show them this page in action then step through the source-code. Excerpt from ${ACSHOME}/packages/lars-blogger/www/index.adp Show them the page and explain the template. OpenACS from Startup to Page request: OpenACS starts up with AOLServer starting up. The first thing aolserver does is load it's compiled modules. A working version of OpenACS requires: nssock.so (socket listener) nslog.so (logger) postgres.so (postgres driver) nssha1.so (encryption library used for encrypting passwords etc.) nscache.so (caching system which allows blocks of text to be shared between threads) nsrewrite.so (similar to mod_rewrite for apache .. used by the request processor to internally redirect requests). nsxml.so (link between aolserver and libxml2) nsssl.so (if you aren't worried about security this is unneccessary .. but all real webservices should .. this is the secure socket listener) All of the libraries are written in C for speed. Many of them don't come w/ the standard AOLServer distribution and must be compiled separately. From there it loads the libraries under $ACS_HOME/tcl . These libraries tell it to start up the bootstrapping package. The bootstrapping package includes all the basic libraries that every package needs to load. These include the db-api, the package manager, the utilities, and the procs for making procs. From there it: * Connects to database to make sure it works as expected. * Loads up the acs-tcl package which includes the core tcl packages which every package on the system can require. * Checks the database to see what other packages are currently installed (if none then starts the installation process). -- Demonstrate the installation process? If I can get it to work on my MacTop. * Load the site map, confirm that there is an admin user * Load some basic parameters * Load the libraries of the individual packages * Register filters necessary for the Request Processor * Binds to the sockets * Starts to run scheduled procedures (a faculty of AOLserver). Recieving Requests. ACS 3.x used to have file extensions. * /pants/buckle.html :: loads the static html ${ACSHOME}/www/pants/buckle.html * /pants/zipper.tcl :: ${ACSHOME}/www/pants/zipper.tcl ..runs a script which would ns_write a response to the request. Until the script ends the connection is held open so you can keep on write as many times as you want to the connection extending the data. This is typically bad policy, but is good if you want to give an interactive feeling to report on the progress of some action .. like installation. * /pants/leg.adp :: ${ACSHOME}/www/pants/leg.adp is an AOLServer template. These allow programmers to mix code and HTML in the same files .. similar to JSP. It should be noted that .adp files are completely inadequate to their purpose. I've worked with several template designers who were uncomfortable with the mix of code and static HTML. There were many real problems with this system and it was not as capable as the JSP syntax which allowed you to have a chunk of HTML in a loop (for instance). Eventually a new templating system was created (show later) which made it easier for the graphic designers to move variables and html around. It is completely custom to the ACS and suffers because of that. Some of us are pushing for them to adopt a more standard HTML::Template syntax based on a PERL module. It has been ported to PHP, Java, Python, etc. etc. ACS 4.x : There was a big push at aD in mid 2000 for what is called abstract URL's. This would allow you to eliminate file extensions and have a request processor cleverly figure out what pages to load. This was done to hide implementation details from users and prevent their bookmarks from breaking when you altered implementation. ACS 3.x had a hack to support this .. but it wasn't put into mainstream usage until 4.x. With the birth of request processor, a site-map, abstract urls, and packages a whole new level of overhead was added to each request. When a request is recieved for the /ticket-tracker (for example) the request processor recieves the call. IMP Detail: The request processor uses an aolserver faculty called filters. Filters are assigned to url patterns (in the case of the RP the pattern is /*). The filter then runs some code before exiting with a filter code. FILTER_OK means run the next filter or load the page. FILTER_BREAK don't run anymore filters and load the page. FILTER_RETURN close the connection the filter has done all the work. The request processor has to do several things: * check to see if ${ACS_HOME}/www/ticket-tracker.* exists .. if it doesn't then * figure out which package is loaded at /ticket-tracker/ by checking the site map. * since there is an instance of ticket-tracker mounted at /ticket-tracer we then look for ${ACS_HOME}/packages/ticket-tracker/index.* * since there is an index.tcl we load that. * The RP also stores some variables about the package_id on the map, the site_node_id, the user's form variables, and the user's cookies. * The RP will try to load files by extension in the order : tcl, adp, html, htm, -- whatever the first is alphabetically. (Abstract URL support) Site Map: Each package can be instantiated. Package instances can be mounted on the site map. Some packages are developed so that different instances can share the same tables but are completely invisible to each other. If you are setting up a group of bboards for different communities .. you want the communities to all have bboard but not to read each other's bboard. A package instance can be mapped to any # of site-nodes. How to build applications What the advantages are over starting itself Weaknesses/Advantages architecture point of view Pictures