Personal tools
You are here: Home Knowledge Techie Central OpenACS presentation openacs-presentation.txt
Document Actions

openacs-presentation.txt

Click here to get the file

Size 24.6 kB - File type text/plain

File contents

http://philip.greenspun.com/panda/

Guide to Web Publishing

History of OpenACS:

http://openacs.org/about/history (official history)

* Originally was the ACS

   * arsDigita founded by an MIT professor was a company which aimed to create
     an open source web community toolkit.

   * business model was to build online services using this toolkit.
     
       *  Development Gateway (WorldBank) www.developmentgateway.org - ACS

       *  Knowledge Management System for Siemens
          Corporation. (intranet application) - OpenACS/ACS hybrid
  
       *  Deutsche Bank Intranet - ACS

       *  site59.com:  Last Minute Travel site (www.site59.com) - ACS

       *  scorecard.org:  Environmental site which at one point served 
                          30 db-backed page hits a day on an old Sun Pizza Box
                          (Sun UltraSparc II) proto-ACS

       *  photo.net: Community Site for camera enthusiasts serves
                      hundreds of thousands of hits a day.  (www.photo.net) - ACS

   * architecture used ORACLE
       
       * it was proprietary so it was discouraging for OSS enthusiasts

       * ORACLE had free licensing for non-profit projects

       * at the time of development ORACLE was the only decent *ACID
         ref*ACID database on the market (besides the even more
         expensive DB2).  There were no acceptable OSS alternatives.
 
       * Until PostGres 7.0 ACID was not available on an OSS DB.  With
         Postgres 7.0 ORACLE was no longer necessary

   * brief software history

       * originally a group of utilities written for Hearst
         Publishing.
  
       * eventually a set of services came out of the system including
         bboard, classified, and neighbor-to-neighbor

       * as a project which evolved w/ little to no planning it had
         next to no software architecture.  It was just a "pile of
         code" without any sense of standards or large-scale design.
         Lots of replication, defied any modern sense of an
         engineering standard.q

       * OpenACS was just a port of the pile of code originally

       * ACS 4.0 was an attempt to build a system with more
         integration and services .. it would have a concept of a
         kernel and core services on which to build applications.
 
       * Current version of OpenACS 4.x is a port of this code and it
         improves on it since arsDigita later abandoned TCL ACS.

   * OpenACS was started as a project on SourceForge in Dec. 1999 to
     port ACS to Postgres (Originally the project was called ACS/pg).
     At this time I was at home doing the problem-sets *psets-link* so
     that I could work at aD.

   * arsDigita eventually moved to a Java platform and was bought by
     RedHat.  The new RedHat CCM *link* is just the ACS-Java
     rebranded.

Basic Architecture:


    AOLserver -- ns_db API (custom postgres driver) -- PGsql
 
       |                                                 |

     TCL Libraries                                    Tables/Data
     TCL Scripts
     ADP templates
     Images
     Static Files



*  AOLServer

   *  Used by AOL to serve all of it's webcontent.  Developed back when
      Netscape was version 0.9 as NaviServer.  Later was opensourced
      as AOLServer.  (if you use it lots of items refer to nsd or NaviServer
      daemon .. just a historical detail)

   *  OSS multi-threaded webserver written in C (and extensible in C) 

       * tightly integrated with TCL (a lightweight interpreted scripting language)

       * well defined database API which utilized database pooling

       * database pooling is important because the cost of opening new
         connections to a database is reduced to near zero as the
         connections share the database connections.  

       * database pooling relies on the multi-threaded nature of the
         webserver.
  
       * Apache, the #1 webserver in the world (also OSS) has recently
         moved from a multi-process server to a multi-threaded server
         with Apache 2.x.  Apache is not tightly integrated w/
         anything .. which has it's advantages and disadvantages.  

       * Threading is important because the cost of starting and
         maintaining threads is much lower then processes.  It also
         allows easy sharing of data between threads instead of
         needing to use IPC-based methods. 

* RDBMS:

   * Relational DB's are best conceived as a set of related
      spreadsheets.  They are composed of columns representing certain
      types of data and rows of data which meet the columns
      requirements.

   * Most modern relational databases are filled and accessed via a
      declarative language called SQL.  (Declarative languages are
      ones where you state the information you want and the system
      figures out how to retrieve the information .. you don't have to
      know anything about storage.  Another example is XSLT).  A SQL
      standard is released every so often and most databases ignore
      part or most of the standard.  ORACLE implements outer-joins
      using a non-standard syntax, PostGres just recently implemented
      outer-joins at all, and MySQL is missing so many features that
      MySQL and SQL-92 compliant shouldn't be mentioned in the same
      breath (although it is improving by leaps and bounds).

   * Anatomy of SQL:

      <put in a few SQL examples.  Explain select, where, joins, outer
      joins, group by and order by>  

    http://philip.greenspun.com/sql
      is how I learned SQL.

     Visual tools like those found in ACCESS attempt to hide the
     complexity of SQL from end users.  Take a little while to learn
     SQL and you'll be infinitely better off then suffering through
     those cumbersome tools (which prove inadequate when writing a 250
     line query).

   * The most commonly deployed RDBMS is MySQL, for several reasons:
        1.  It is OSS and totally free
        2.  It is possibly the fastest of the major databases when it
            comes to read.  In order to accomplish this feat it has problems
            with large quantities of writes, and it is still working on
            becoming ACID compliant.  IMHO it is the best read-only database,
            but I would never use it for mission critical applications.

   * I've mentioned ACID compliance twice, what is ACID compliance and
     why should anyone care?  

   Atomicity

     Results of a transaction's execution are either all committed or
     all rolled back. All changes take effect, or none do. That means,
     for Joe User's money transfer, that both his savings and checking
     balances are adjusted or neither are.

   Consistency

     The database is transformed from one valid state to another valid
     state. This defines a transaction as legal only if it obeys
     user-defined integrity constraints. Illegal transactions aren't
     allowed and, if an integrity constraint can't be satisfied then
     the transaction is rolled back. For example, suppose that you
     define a rule that, after a transfer of more than $10,000 out of
     the country, a row is added to an audit table so that you can
     prepare a legally required report for the IRS. Perhaps for
     performance reasons that audit table is stored on a separate disk
     from the rest of the database. If the audit table's disk is
     off-line and can't be written, the transaction is aborted.

   Isolation

     The results of a transaction are invisible to other transactions
     until the transaction is complete. For example, if you are
     running an accounting report at the same time that Joe is
     transferring money, the accounting report program will either see
     the balances before Joe transferred the money or after, but never
     the intermediate state where checking has been credited but
     savings not yet debited.

   Durability

     Once committed (completed), the results of a transaction are
     permanent and survive future system and media failures. If the
     airline reservation system computer gives you seat 22A and
     crashes a millisecond later, it won't have forgotten that you are
     sitting in 22A and also give it to someone else. Furthermore, if
     a programmer spills coffee into a disk drive, it will be possible
     to install a new disk and recover the transactions up to the
     coffee spill, showing that you had seat 22A.

     * pilfered from http://philip.greenspun.com/panda/databases-choosing

  Maybe I haven't convinced you that an ACID compliant database is
  important.  After all major software companies use MySQL all the
  time.  You're just going to have to accept that I am right and they
  are wrong.  Once you've worked with a fully ACID compliant database
  it becomes nervewracking to work on a non-ACID dbms.

 * Locking:

    Databases as feature-light as MySQL and as expensive as MS
    SQLServer use a "write-unfriendly" technique known as "table-level
    locking".  When a user is attempting to alter the data on one row
    of a table the whole table becomes locked to all other db-users.
    This is fine when you have 1 writer, but when you are trying to
    handle 20 concurrent writes you will inevitably end up with a
    queue of people waiting to write.  To make matters worse people
    who want to read are waiting for the writers.  PostGreSQL and
    ORACLE use row-level locking and only lock the rows to other
    writers thereby better supporting high numbers of concurrent
    readers and writers.

 * Indexes:

    Any large database requires careful management of indices.
    Usually when trying to run a query the database's query planner
    will figure out how to best use existing indices to accelerate
    your query.  If you run a query against a 20 million row table and
    one of your join criteria runs against data not in an index you
    may be in for a long wait.  If you add an appropriate index
    (appropriate being a key word) you can cut a query which takes 30
    minutes down to one which takes 2 seconds.

    -- discuss the dangers of over/improper indexing?

    -- discuss how queries are executed (parse, analyze, plan,
       implement, return cursor)?

* Triggers:
   
    I mentioned triggers earlier.  These are event listeners which
    fire when a certain event occurs.  Triggers are used to maintain
    auditting tables, data integrity (if a row is added or removed
    from one table you may want to seemlessly add or remove rows from
    another table), and a variety of other things.
   
    -- discuss the dangers of triggers?

PostGres:

   * http://developer.postgresql.org/docs/postgres/history.html

   Postgres was originally developed as a project in the University of
   California, Berkeley in 1986.  As the project grew in scope it was
   used by a jet engine performance package, an asteroid tracking db,
   and many other projects.  By 1993, with a solid core of developers
   (outside the school) and with mounting maintenance issues they
   decided to end the school's work with the project.  In 1994 a SQL
   interpreter was added to PostGres and released to the web as
   PostGres95.  To prevent the need to rename the project every year
   the project was renamed in 1996 to PostgreSQL.  Since then they
   have continuously added new features to the product making it the
   most ANSI compliant of all the OSS databases.


   * PostgreSQL issues/capabilities.

   ORACLE allows you to write stored procedures in a "Procedural"
   version of SQL called (appropriately) PL/SQL.  PostgreSQL does
   ORACLE one better allowing you to write stored procedures in most
   of the major programming languages (C, C++, PERL, Python, TCL,
   Java).

   The major RDBMS vendors (meaning ORACLE and IBM/DB2) products have
   several advantages over PostgreSQL.  The three biggest ones are
   that they are better with enormous data sets and data flows.  Their
   other advantage is with clustering/distribution/mirroring.  ORACLE
   has a setup to optimize data security as well as accelerating
   writes and reads.  It only requires 44 SCSI disks.  It distributes
   it's many different functions over these 22 disks and then mirrors
   each of them.  With these sorts of optimizations they should open
   source their database and go into the hardware business.

-- Should probably restructure this



ACS 4.x configuration

   
          

AOLServer


    *  ${NSHOME}/your-server.tcl 
       
       most acs configurations set NSHOME=/home/aolserver but I
       prefer NSHOME=/etc/aolserver

          when you start aolserver you the command is
 
          aolserver -u (user) -g (group) -it your-server.tcl

          -i = installed mode -- logging is done in your defined log file
          -f = foreground mode -- piped to stdout
          -t = you are using the new-school .tcl files.  They used to
               have a .ini structure which sucked and was replaced by
               the excellent .tcl method.


    * ${ACS_LOG_DIR}/access.log and error.log

      a standard aolserver/acs install will create two logs. 
      1) an access log that stores the individual requests sorted by date.
        
192.168.125.30 - - [15/Apr/2003:18:28:15 +0500] "GET /site-info?email=tristancohen%40yah
oo%2ecom HTTP/1.1" 200 0 "http://192.168.124.8/create-administrator" "Mozilla/5.0 (Macin
tosh; U; PPC Mac OS X Mach-O; en-US; rv:1.0.1) Gecko/20030306 Camino/0.7"

      is a standard access log entry.

      192.168.125.30 ::  Requesting IP
   
      [15/Apr/2003:18:28:15 +0500] :: Date (duh)

      "GET /site-info?email=tristancohen%40yahoo%2ecom HTTP/1.1" :: Abbreviated HTTP request
 
      for a great reference on HTTP protocol check out:

      http://www.w3.org/Protocols/ (if you are a masochist)

      or
  
      http://www.jmarshall.com/easy/http/ (I haven't read it yet
      .. but it seems good)

      200 :: HTTP response code

      0   :: I don't remember  (doh)

      "http://192.168.124.8/create-administrator"  :: referring URL (sent by 
         user's browser

      "Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.0.1) 
       Gecko/20030306 Camino/0.7"  :: Information on the requesting client (sent
          by user's browser.

    * ${NSHOME}/modules/tcl :: Globally deployed TCL modules (part of
                               the standard aolserver release).
                               Includes the http tcl module which is
                               very handy if you want to have your
                               server do HTTP POST operations (these
                               are not available through the standard
                               aolserver TCL API)

    * ${ACS_HOME}/your-server  

       the standard acs install puts all your acs files under
       /web/your-server . If you are more into the standard debian
       setup you can put it under /var/aolserver/your-server .


ACS Directory Structure:

          * ./www :: The directory where your standard ACS pages are
                     located.  During the old 3.x days your packages
                     and their pages were under this directory.  With
                     the advent of the new packaging system only a few
                     very standard pages are under this directory.
                     Most are now located under the pacakges
                     directory.

          * ./tcl :: This directory stores all the bootstrapping
                     libraries for the system.  During the 3.x days
                     all the libraries were placed under this
                     directory.

          * ./packages :: This is the maine directory which stores all
                          the apm's (ACS Packages).  Their format
                          mirrors the base server format.

               * ./package-name :: the subdirectory for a particular
                 package.

 
                  * ./www :: The pages for the package are located
                             under this directory.
                  
                  * ./tcl :: The libraries to be loaded at startup are
                             under this directory.

                  * ./sql :: The data model for this package is
                             located here.


                        * ./oracle :: Under this subdirectory is the
                                        data model as set up for
                                        ORACLE.

                        * ./postgres :: This subdir houses the dm for
                                        postgresql.

                  * ./package-name.info :: XML file describing the
                                           required dependencies and
                                           the files w/in the package.


ACS File Types:


   * .sql :: The SQL scripts used for package installation.

   * /tcl/*.tcl :: These files (sourced in "ls" order) contain the
                   libraries loaded at system startup time.

   * /www/*.tcl :: These files are the scripts which assemble the data
                   for a template.

   * /www/*.adp :: These are the standard templates for any individual
                   package.
                   

   * /(tcl|www)/*.xql :: The OpenACS people have decided to try and keep ORACLE
                   support for their software through a "crude" method of
                   database abstraction.

   Anatomy:

   /tcl/*.tcl

Excerpt from ${ACSHOME}/packages/acs-tcl/tcl/object-procs.tcl


ad_library {

    Object support for ACS.

    @author Jon Salz (jsalz@arsdigita.com)
    @creation-date 11 Aug 2000
    @cvs-id $Id: object-procs.tcl,v 1.2 2002/09/10 22:22:14 jeffd Exp $

}

     ad_library stores some basic text in a javadoc-esque format.
     When the library is loaded, and if documentation creation is
     enabled, then the block of text in {}'s is written into a
     documentation file.  Similarly ad_proc documents each of the
     proc's inside the libraries.


ad_proc -private acs_lookup_magic_object { name } {

    Returns the object ID of a magic object (performing no memoization).

} {
    return [db_string magic_object_select {
	select object_id from acs_magic_objects where name = :name
    }]
}

     ad_proc is a replacement for the TCL standard proc.  It allows
     you to set the public or privateness of the procedure.  db_string
     uses the ACS db_api discussed later.  There are many other procs
     inside this file.

Excerpt from ${ACSHOME}/packages/lars-blogger/www/index.tcl

     Show them this page in action then step through the source-code.

Excerpt from ${ACSHOME}/packages/lars-blogger/www/index.adp

     Show them the page and explain the template.

OpenACS from Startup to Page request:

  OpenACS starts up with AOLServer starting up.  The first thing
  aolserver does is load it's compiled modules.  A working version of
  OpenACS requires:

      nssock.so    (socket listener)
      nslog.so     (logger)
      postgres.so  (postgres driver)
      nssha1.so    (encryption library used for encrypting passwords etc.)
      nscache.so   (caching system which allows blocks of text to be shared between threads)
      nsrewrite.so (similar to mod_rewrite for apache .. used by the request processor to
                    internally redirect requests).
      nsxml.so     (link between aolserver and libxml2)
      nsssl.so     (if you aren't worried about security this is unneccessary .. but
                    all real webservices should .. this is the secure socket listener)

   All of the libraries are written in C for speed.  Many of them
   don't come w/ the standard AOLServer distribution and must be
   compiled separately.

   From there it loads the libraries under $ACS_HOME/tcl .  These libraries
   tell it to start up the bootstrapping package.

   The bootstrapping package includes all the basic libraries that
   every package needs to load.  These include the db-api, the package
   manager, the utilities, and the procs for making procs.
 
   From there it:

       * Connects to database to make sure it works as expected.

       * Loads up the acs-tcl package which includes the core tcl
         packages which every package on the system can require.

       * Checks the database to see what other packages are currently
         installed (if none then starts the installation process).
         
         -- Demonstrate the installation process?  If I can get it  to
            work on my MacTop.

       * Load the site map, confirm that there is an admin user

       * Load some basic parameters

       * Load the libraries of the individual packages

       * Register filters necessary for the Request Processor
 
       * Binds to the sockets
 
       * Starts to run scheduled procedures (a faculty of AOLserver).

Recieving Requests.

   ACS 3.x used to have file extensions. 

   * /pants/buckle.html :: loads the static html 
                           ${ACSHOME}/www/pants/buckle.html

   * /pants/zipper.tcl :: ${ACSHOME}/www/pants/zipper.tcl ..runs a
                          script which would ns_write a response to
                          the request.  Until the script ends the
                          connection is held open so you can keep on
                          write as many times as you want to the
                          connection extending the data.  This is
                          typically bad policy, but is good if you
                          want to give an interactive feeling to
                          report on the progress of some action
                          .. like installation.
    
  
  * /pants/leg.adp :: ${ACSHOME}/www/pants/leg.adp is an AOLServer
                      template.  These allow programmers to mix code and
                      HTML in the same files .. similar to JSP.

  It should be noted that .adp files are completely inadequate to
  their purpose.  I've worked with several template designers who were
  uncomfortable with the mix of code and static HTML.  There were many
  real problems with this system and it was not as capable as the JSP
  syntax which allowed you to have a chunk of HTML in a loop (for
  instance).  Eventually a new templating system was created (show
  later) which made it easier for the graphic designers to move
  variables and html around.  It is completely custom to the ACS and
  suffers because of that.  Some of us are pushing for them to adopt a
  more standard HTML::Template syntax based on a PERL module.  It has
  been ported to PHP, Java, Python, etc. etc.

  ACS 4.x :

  There was a big push at aD in mid 2000 for what is called abstract
  URL's.  This would allow you to eliminate file extensions and have a
  request processor cleverly figure out what pages to load.  This was
  done to hide implementation details from users and prevent their
  bookmarks from breaking when you altered implementation.  ACS 3.x
  had a hack to support this .. but it wasn't put into mainstream
  usage until 4.x.

  With the birth of request processor, a site-map, abstract urls,
  and packages a whole new level of overhead was added to each request.
  
  When a request is recieved for the /ticket-tracker (for example) the
  request processor recieves the call.  

  IMP Detail:
  
     The request processor uses an aolserver faculty called filters.
     Filters are assigned to url patterns (in the case of the RP the
     pattern is /*).  The filter then runs some code before exiting
     with a filter code.  FILTER_OK means run the next filter or load
     the page.  FILTER_BREAK don't run anymore filters and load the
     page.  FILTER_RETURN close the connection the filter has done all
     the work.

  The request processor has to do several things:

    * check to see if ${ACS_HOME}/www/ticket-tracker.* exists .. if it doesn't then

    * figure out which package is loaded at /ticket-tracker/ by
      checking the site map.

    * since there is an instance of ticket-tracker mounted at
      /ticket-tracer we then look for
      ${ACS_HOME}/packages/ticket-tracker/index.*

    * since there is an index.tcl we load that.  

    * The RP also stores some variables about the package_id on the
      map, the site_node_id, the user's form variables, and the user's
      cookies.

    * The RP will try to load files by extension in the order : tcl,
      adp, html, htm, -- whatever the first is alphabetically.
      (Abstract URL support)

Site Map:

  Each package can be instantiated.  Package instances can be mounted
  on the site map.  Some packages are developed so that different
  instances can share the same tables but are completely invisible to
  each other.  If you are setting up a group of bboards for different 
  communities .. you want the communities to all have bboard but not 
  to read each other's bboard.  A package instance can be mapped to 
  any # of site-nodes.



How to build applications
What the advantages are over starting itself
Weaknesses/Advantages

architecture point of view



Pictures
by admin last modified 2003-04-28 10:44


View My Stats