|
|
openacs-presentation.txt
Click here to get the file
Size
24.6 kB
-
File type
text/plain
File contents
http://philip.greenspun.com/panda/
Guide to Web Publishing
History of OpenACS:
http://openacs.org/about/history (official history)
* Originally was the ACS
* arsDigita founded by an MIT professor was a company which aimed to create
an open source web community toolkit.
* business model was to build online services using this toolkit.
* Development Gateway (WorldBank) www.developmentgateway.org - ACS
* Knowledge Management System for Siemens
Corporation. (intranet application) - OpenACS/ACS hybrid
* Deutsche Bank Intranet - ACS
* site59.com: Last Minute Travel site (www.site59.com) - ACS
* scorecard.org: Environmental site which at one point served
30 db-backed page hits a day on an old Sun Pizza Box
(Sun UltraSparc II) proto-ACS
* photo.net: Community Site for camera enthusiasts serves
hundreds of thousands of hits a day. (www.photo.net) - ACS
* architecture used ORACLE
* it was proprietary so it was discouraging for OSS enthusiasts
* ORACLE had free licensing for non-profit projects
* at the time of development ORACLE was the only decent *ACID
ref*ACID database on the market (besides the even more
expensive DB2). There were no acceptable OSS alternatives.
* Until PostGres 7.0 ACID was not available on an OSS DB. With
Postgres 7.0 ORACLE was no longer necessary
* brief software history
* originally a group of utilities written for Hearst
Publishing.
* eventually a set of services came out of the system including
bboard, classified, and neighbor-to-neighbor
* as a project which evolved w/ little to no planning it had
next to no software architecture. It was just a "pile of
code" without any sense of standards or large-scale design.
Lots of replication, defied any modern sense of an
engineering standard.q
* OpenACS was just a port of the pile of code originally
* ACS 4.0 was an attempt to build a system with more
integration and services .. it would have a concept of a
kernel and core services on which to build applications.
* Current version of OpenACS 4.x is a port of this code and it
improves on it since arsDigita later abandoned TCL ACS.
* OpenACS was started as a project on SourceForge in Dec. 1999 to
port ACS to Postgres (Originally the project was called ACS/pg).
At this time I was at home doing the problem-sets *psets-link* so
that I could work at aD.
* arsDigita eventually moved to a Java platform and was bought by
RedHat. The new RedHat CCM *link* is just the ACS-Java
rebranded.
Basic Architecture:
AOLserver -- ns_db API (custom postgres driver) -- PGsql
| |
TCL Libraries Tables/Data
TCL Scripts
ADP templates
Images
Static Files
* AOLServer
* Used by AOL to serve all of it's webcontent. Developed back when
Netscape was version 0.9 as NaviServer. Later was opensourced
as AOLServer. (if you use it lots of items refer to nsd or NaviServer
daemon .. just a historical detail)
* OSS multi-threaded webserver written in C (and extensible in C)
* tightly integrated with TCL (a lightweight interpreted scripting language)
* well defined database API which utilized database pooling
* database pooling is important because the cost of opening new
connections to a database is reduced to near zero as the
connections share the database connections.
* database pooling relies on the multi-threaded nature of the
webserver.
* Apache, the #1 webserver in the world (also OSS) has recently
moved from a multi-process server to a multi-threaded server
with Apache 2.x. Apache is not tightly integrated w/
anything .. which has it's advantages and disadvantages.
* Threading is important because the cost of starting and
maintaining threads is much lower then processes. It also
allows easy sharing of data between threads instead of
needing to use IPC-based methods.
* RDBMS:
* Relational DB's are best conceived as a set of related
spreadsheets. They are composed of columns representing certain
types of data and rows of data which meet the columns
requirements.
* Most modern relational databases are filled and accessed via a
declarative language called SQL. (Declarative languages are
ones where you state the information you want and the system
figures out how to retrieve the information .. you don't have to
know anything about storage. Another example is XSLT). A SQL
standard is released every so often and most databases ignore
part or most of the standard. ORACLE implements outer-joins
using a non-standard syntax, PostGres just recently implemented
outer-joins at all, and MySQL is missing so many features that
MySQL and SQL-92 compliant shouldn't be mentioned in the same
breath (although it is improving by leaps and bounds).
* Anatomy of SQL:
<put in a few SQL examples. Explain select, where, joins, outer
joins, group by and order by>
http://philip.greenspun.com/sql
is how I learned SQL.
Visual tools like those found in ACCESS attempt to hide the
complexity of SQL from end users. Take a little while to learn
SQL and you'll be infinitely better off then suffering through
those cumbersome tools (which prove inadequate when writing a 250
line query).
* The most commonly deployed RDBMS is MySQL, for several reasons:
1. It is OSS and totally free
2. It is possibly the fastest of the major databases when it
comes to read. In order to accomplish this feat it has problems
with large quantities of writes, and it is still working on
becoming ACID compliant. IMHO it is the best read-only database,
but I would never use it for mission critical applications.
* I've mentioned ACID compliance twice, what is ACID compliance and
why should anyone care?
Atomicity
Results of a transaction's execution are either all committed or
all rolled back. All changes take effect, or none do. That means,
for Joe User's money transfer, that both his savings and checking
balances are adjusted or neither are.
Consistency
The database is transformed from one valid state to another valid
state. This defines a transaction as legal only if it obeys
user-defined integrity constraints. Illegal transactions aren't
allowed and, if an integrity constraint can't be satisfied then
the transaction is rolled back. For example, suppose that you
define a rule that, after a transfer of more than $10,000 out of
the country, a row is added to an audit table so that you can
prepare a legally required report for the IRS. Perhaps for
performance reasons that audit table is stored on a separate disk
from the rest of the database. If the audit table's disk is
off-line and can't be written, the transaction is aborted.
Isolation
The results of a transaction are invisible to other transactions
until the transaction is complete. For example, if you are
running an accounting report at the same time that Joe is
transferring money, the accounting report program will either see
the balances before Joe transferred the money or after, but never
the intermediate state where checking has been credited but
savings not yet debited.
Durability
Once committed (completed), the results of a transaction are
permanent and survive future system and media failures. If the
airline reservation system computer gives you seat 22A and
crashes a millisecond later, it won't have forgotten that you are
sitting in 22A and also give it to someone else. Furthermore, if
a programmer spills coffee into a disk drive, it will be possible
to install a new disk and recover the transactions up to the
coffee spill, showing that you had seat 22A.
* pilfered from http://philip.greenspun.com/panda/databases-choosing
Maybe I haven't convinced you that an ACID compliant database is
important. After all major software companies use MySQL all the
time. You're just going to have to accept that I am right and they
are wrong. Once you've worked with a fully ACID compliant database
it becomes nervewracking to work on a non-ACID dbms.
* Locking:
Databases as feature-light as MySQL and as expensive as MS
SQLServer use a "write-unfriendly" technique known as "table-level
locking". When a user is attempting to alter the data on one row
of a table the whole table becomes locked to all other db-users.
This is fine when you have 1 writer, but when you are trying to
handle 20 concurrent writes you will inevitably end up with a
queue of people waiting to write. To make matters worse people
who want to read are waiting for the writers. PostGreSQL and
ORACLE use row-level locking and only lock the rows to other
writers thereby better supporting high numbers of concurrent
readers and writers.
* Indexes:
Any large database requires careful management of indices.
Usually when trying to run a query the database's query planner
will figure out how to best use existing indices to accelerate
your query. If you run a query against a 20 million row table and
one of your join criteria runs against data not in an index you
may be in for a long wait. If you add an appropriate index
(appropriate being a key word) you can cut a query which takes 30
minutes down to one which takes 2 seconds.
-- discuss the dangers of over/improper indexing?
-- discuss how queries are executed (parse, analyze, plan,
implement, return cursor)?
* Triggers:
I mentioned triggers earlier. These are event listeners which
fire when a certain event occurs. Triggers are used to maintain
auditting tables, data integrity (if a row is added or removed
from one table you may want to seemlessly add or remove rows from
another table), and a variety of other things.
-- discuss the dangers of triggers?
PostGres:
* http://developer.postgresql.org/docs/postgres/history.html
Postgres was originally developed as a project in the University of
California, Berkeley in 1986. As the project grew in scope it was
used by a jet engine performance package, an asteroid tracking db,
and many other projects. By 1993, with a solid core of developers
(outside the school) and with mounting maintenance issues they
decided to end the school's work with the project. In 1994 a SQL
interpreter was added to PostGres and released to the web as
PostGres95. To prevent the need to rename the project every year
the project was renamed in 1996 to PostgreSQL. Since then they
have continuously added new features to the product making it the
most ANSI compliant of all the OSS databases.
* PostgreSQL issues/capabilities.
ORACLE allows you to write stored procedures in a "Procedural"
version of SQL called (appropriately) PL/SQL. PostgreSQL does
ORACLE one better allowing you to write stored procedures in most
of the major programming languages (C, C++, PERL, Python, TCL,
Java).
The major RDBMS vendors (meaning ORACLE and IBM/DB2) products have
several advantages over PostgreSQL. The three biggest ones are
that they are better with enormous data sets and data flows. Their
other advantage is with clustering/distribution/mirroring. ORACLE
has a setup to optimize data security as well as accelerating
writes and reads. It only requires 44 SCSI disks. It distributes
it's many different functions over these 22 disks and then mirrors
each of them. With these sorts of optimizations they should open
source their database and go into the hardware business.
-- Should probably restructure this
ACS 4.x configuration
AOLServer
* ${NSHOME}/your-server.tcl
most acs configurations set NSHOME=/home/aolserver but I
prefer NSHOME=/etc/aolserver
when you start aolserver you the command is
aolserver -u (user) -g (group) -it your-server.tcl
-i = installed mode -- logging is done in your defined log file
-f = foreground mode -- piped to stdout
-t = you are using the new-school .tcl files. They used to
have a .ini structure which sucked and was replaced by
the excellent .tcl method.
* ${ACS_LOG_DIR}/access.log and error.log
a standard aolserver/acs install will create two logs.
1) an access log that stores the individual requests sorted by date.
192.168.125.30 - - [15/Apr/2003:18:28:15 +0500] "GET /site-info?email=tristancohen%40yah
oo%2ecom HTTP/1.1" 200 0 "http://192.168.124.8/create-administrator" "Mozilla/5.0 (Macin
tosh; U; PPC Mac OS X Mach-O; en-US; rv:1.0.1) Gecko/20030306 Camino/0.7"
is a standard access log entry.
192.168.125.30 :: Requesting IP
[15/Apr/2003:18:28:15 +0500] :: Date (duh)
"GET /site-info?email=tristancohen%40yahoo%2ecom HTTP/1.1" :: Abbreviated HTTP request
for a great reference on HTTP protocol check out:
http://www.w3.org/Protocols/ (if you are a masochist)
or
http://www.jmarshall.com/easy/http/ (I haven't read it yet
.. but it seems good)
200 :: HTTP response code
0 :: I don't remember (doh)
"http://192.168.124.8/create-administrator" :: referring URL (sent by
user's browser
"Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.0.1)
Gecko/20030306 Camino/0.7" :: Information on the requesting client (sent
by user's browser.
* ${NSHOME}/modules/tcl :: Globally deployed TCL modules (part of
the standard aolserver release).
Includes the http tcl module which is
very handy if you want to have your
server do HTTP POST operations (these
are not available through the standard
aolserver TCL API)
* ${ACS_HOME}/your-server
the standard acs install puts all your acs files under
/web/your-server . If you are more into the standard debian
setup you can put it under /var/aolserver/your-server .
ACS Directory Structure:
* ./www :: The directory where your standard ACS pages are
located. During the old 3.x days your packages
and their pages were under this directory. With
the advent of the new packaging system only a few
very standard pages are under this directory.
Most are now located under the pacakges
directory.
* ./tcl :: This directory stores all the bootstrapping
libraries for the system. During the 3.x days
all the libraries were placed under this
directory.
* ./packages :: This is the maine directory which stores all
the apm's (ACS Packages). Their format
mirrors the base server format.
* ./package-name :: the subdirectory for a particular
package.
* ./www :: The pages for the package are located
under this directory.
* ./tcl :: The libraries to be loaded at startup are
under this directory.
* ./sql :: The data model for this package is
located here.
* ./oracle :: Under this subdirectory is the
data model as set up for
ORACLE.
* ./postgres :: This subdir houses the dm for
postgresql.
* ./package-name.info :: XML file describing the
required dependencies and
the files w/in the package.
ACS File Types:
* .sql :: The SQL scripts used for package installation.
* /tcl/*.tcl :: These files (sourced in "ls" order) contain the
libraries loaded at system startup time.
* /www/*.tcl :: These files are the scripts which assemble the data
for a template.
* /www/*.adp :: These are the standard templates for any individual
package.
* /(tcl|www)/*.xql :: The OpenACS people have decided to try and keep ORACLE
support for their software through a "crude" method of
database abstraction.
Anatomy:
/tcl/*.tcl
Excerpt from ${ACSHOME}/packages/acs-tcl/tcl/object-procs.tcl
ad_library {
Object support for ACS.
@author Jon Salz (jsalz@arsdigita.com)
@creation-date 11 Aug 2000
@cvs-id $Id: object-procs.tcl,v 1.2 2002/09/10 22:22:14 jeffd Exp $
}
ad_library stores some basic text in a javadoc-esque format.
When the library is loaded, and if documentation creation is
enabled, then the block of text in {}'s is written into a
documentation file. Similarly ad_proc documents each of the
proc's inside the libraries.
ad_proc -private acs_lookup_magic_object { name } {
Returns the object ID of a magic object (performing no memoization).
} {
return [db_string magic_object_select {
select object_id from acs_magic_objects where name = :name
}]
}
ad_proc is a replacement for the TCL standard proc. It allows
you to set the public or privateness of the procedure. db_string
uses the ACS db_api discussed later. There are many other procs
inside this file.
Excerpt from ${ACSHOME}/packages/lars-blogger/www/index.tcl
Show them this page in action then step through the source-code.
Excerpt from ${ACSHOME}/packages/lars-blogger/www/index.adp
Show them the page and explain the template.
OpenACS from Startup to Page request:
OpenACS starts up with AOLServer starting up. The first thing
aolserver does is load it's compiled modules. A working version of
OpenACS requires:
nssock.so (socket listener)
nslog.so (logger)
postgres.so (postgres driver)
nssha1.so (encryption library used for encrypting passwords etc.)
nscache.so (caching system which allows blocks of text to be shared between threads)
nsrewrite.so (similar to mod_rewrite for apache .. used by the request processor to
internally redirect requests).
nsxml.so (link between aolserver and libxml2)
nsssl.so (if you aren't worried about security this is unneccessary .. but
all real webservices should .. this is the secure socket listener)
All of the libraries are written in C for speed. Many of them
don't come w/ the standard AOLServer distribution and must be
compiled separately.
From there it loads the libraries under $ACS_HOME/tcl . These libraries
tell it to start up the bootstrapping package.
The bootstrapping package includes all the basic libraries that
every package needs to load. These include the db-api, the package
manager, the utilities, and the procs for making procs.
From there it:
* Connects to database to make sure it works as expected.
* Loads up the acs-tcl package which includes the core tcl
packages which every package on the system can require.
* Checks the database to see what other packages are currently
installed (if none then starts the installation process).
-- Demonstrate the installation process? If I can get it to
work on my MacTop.
* Load the site map, confirm that there is an admin user
* Load some basic parameters
* Load the libraries of the individual packages
* Register filters necessary for the Request Processor
* Binds to the sockets
* Starts to run scheduled procedures (a faculty of AOLserver).
Recieving Requests.
ACS 3.x used to have file extensions.
* /pants/buckle.html :: loads the static html
${ACSHOME}/www/pants/buckle.html
* /pants/zipper.tcl :: ${ACSHOME}/www/pants/zipper.tcl ..runs a
script which would ns_write a response to
the request. Until the script ends the
connection is held open so you can keep on
write as many times as you want to the
connection extending the data. This is
typically bad policy, but is good if you
want to give an interactive feeling to
report on the progress of some action
.. like installation.
* /pants/leg.adp :: ${ACSHOME}/www/pants/leg.adp is an AOLServer
template. These allow programmers to mix code and
HTML in the same files .. similar to JSP.
It should be noted that .adp files are completely inadequate to
their purpose. I've worked with several template designers who were
uncomfortable with the mix of code and static HTML. There were many
real problems with this system and it was not as capable as the JSP
syntax which allowed you to have a chunk of HTML in a loop (for
instance). Eventually a new templating system was created (show
later) which made it easier for the graphic designers to move
variables and html around. It is completely custom to the ACS and
suffers because of that. Some of us are pushing for them to adopt a
more standard HTML::Template syntax based on a PERL module. It has
been ported to PHP, Java, Python, etc. etc.
ACS 4.x :
There was a big push at aD in mid 2000 for what is called abstract
URL's. This would allow you to eliminate file extensions and have a
request processor cleverly figure out what pages to load. This was
done to hide implementation details from users and prevent their
bookmarks from breaking when you altered implementation. ACS 3.x
had a hack to support this .. but it wasn't put into mainstream
usage until 4.x.
With the birth of request processor, a site-map, abstract urls,
and packages a whole new level of overhead was added to each request.
When a request is recieved for the /ticket-tracker (for example) the
request processor recieves the call.
IMP Detail:
The request processor uses an aolserver faculty called filters.
Filters are assigned to url patterns (in the case of the RP the
pattern is /*). The filter then runs some code before exiting
with a filter code. FILTER_OK means run the next filter or load
the page. FILTER_BREAK don't run anymore filters and load the
page. FILTER_RETURN close the connection the filter has done all
the work.
The request processor has to do several things:
* check to see if ${ACS_HOME}/www/ticket-tracker.* exists .. if it doesn't then
* figure out which package is loaded at /ticket-tracker/ by
checking the site map.
* since there is an instance of ticket-tracker mounted at
/ticket-tracer we then look for
${ACS_HOME}/packages/ticket-tracker/index.*
* since there is an index.tcl we load that.
* The RP also stores some variables about the package_id on the
map, the site_node_id, the user's form variables, and the user's
cookies.
* The RP will try to load files by extension in the order : tcl,
adp, html, htm, -- whatever the first is alphabetically.
(Abstract URL support)
Site Map:
Each package can be instantiated. Package instances can be mounted
on the site map. Some packages are developed so that different
instances can share the same tables but are completely invisible to
each other. If you are setting up a group of bboards for different
communities .. you want the communities to all have bboard but not
to read each other's bboard. A package instance can be mapped to
any # of site-nodes.
How to build applications
What the advantages are over starting itself
Weaknesses/Advantages
architecture point of view
Pictures
by
admin
—
last modified
2003-04-28 10:44
|
openoffice instalation instructions
|