The HTTP (HyperText Transfer Protocol) is the W3C standard protocol for transferring information between a web-client (browser) and a web-server. The protocol is a simple envelope protocol where standard name/value pairs in the header are used to split the stream into messages and communicate about the connection-status. Many languages have client and or server libraries to deal with the HTTP protocol, making it a suitable candidate for general purpose client-server applications.
In this document we describe a modular infra-structure to access web-servers from SWI-Prolog and turn Prolog into a web-server.
This work has been carried out under the following projects: GARP, MIA, IBROW, KITS and MultiMediaN The following people have pioneered parts of this library and contributed with bug-report and suggestions for improvements: Anjo Anjewierden, Bert Bredeweg, Wouter Jansweijer, Bob Wielinga, Jacco van Ossenbruggen, Michiel Hildebrandt, Matt Lilley and Keri Harris.
This package provides two packages for building HTTP clients. The
first,
library(http/http_open)
is a lightweight library for
opening a HTTP URL address as a Prolog stream. It can only deal with the
HTTP GET protocol. The second, library(http/http_client)
is
a more advanced library dealing with keep-alive, chunked
transfer and a plug-in mechanism providing conversions based on the
MIME content-type.
This library provides a light-weight HTTP client library to get the data from a URL. The functionality of the library can be extended by loading two additional modules that acts as plugins:
POST
method in addition to GET
and HEAD
.
Here is a simple example to fetch a web-page:
?- http_open('http://www.google.com/search?q=prolog', In, []), copy_stream_data(In, user_output), close(In). <!doctype html><head><title>prolog - Google Search</title><script> ...
The example below fetches the modification time of a web-page. Note that Modified is '' if the web-server does not provide a time-stamp for the resource. See also parse_time/2.
modified(URL, Stamp) :- http_open(URL, In, [ method(head), header(last_modified, Modified) ]), close(In), Modified \== '', parse_time(Modified, Stamp).
get
(default) or head
. The head
message can be used in combination with the header(Name, Value) option
to access information on the resource without actually fetching the
resource itself. The returned stream must be closed immediately. If
library(http/http_header) is loaded,
http_open/3 also supports post
.
See the post(Data) option.
Content-Length
in
the reply header.
infinite
).
authorization
option.
User-Agent
field of the HTTP
header. Default is SWI-Prolog (http://www.swi-prolog.org)
.
The hook http:open_options/2 can be used to provide default options based on the broken-down URL.
URL | is either an atom (url) or a
list of parts. If this list is provided, it may contain the
fields
scheme , user , password , host , port , path
and
search (where the argument of the latter is a Name(Value)
list). Only host is mandatory. |
-
, possibly
defined authorization is cleared. For example:
?- http_set_authorization('http://www.example.com/private/', basic('John', 'Secret'))
:- multifile http:open_options/2. http:open_options(Parts, Options) :- memberchk(host(Host), Parts), Host \== localhost, Options = [proxy('proxy.local', 3128)].
Cookie:
header for the current connection. Out
is an open stream to the HTTP server, Parts is the
broken-down request (see uri_components/2)
and Options is the list of options passed to http_open. The
predicate is called as if using ignore/1.
library(http/http_client)
libraryThe library(http/http_client)
library provides more
powerful access to reading HTTP resources, providing keep-alive
connections,
chunked transfer and conversion of the content, such as
breaking down multipart data, parsing HTML, etc. The library
announces itself as providing HTTP/1.1
.
close
(default) a new connection is created for this
request and closed after the request has completed. If 'Keep-Alive'
the library checks for an open connection on the requested host and port
and re-uses this connection. The connection is left open if the other
party confirms the keep-alive and closed otherwise.1.1
.authorization
option.infinite
).User-Agent
field of the HTTP
header. Default is SWI-Prolog (http://www.swi-prolog.org)
.Unit(From,
To)
, where From is an integer and To is
either an integer or the atom end
. HTTP 1.1 only supports Unit
=
bytes
. E.g., to ask for bytes 1000-1999, use the option
range(bytes(1000,1999))
.Remaining options are passed to http_read_data/3.
Name(Value)
pairs to guide the translation of the data. The following options are
supported:
Content-Type
as provided by the HTTP reply
header. Intended as a work-around for badly configured servers.
If no to(Target)
option is provided the library tries
the registered plug-in conversion filters. If none of these succeed it
tries the built-in content-type handlers or returns the content as an
atom. The builtin content filters are described below. The provided
plug-ins are described in the following sections.
Finally, if all else fails the content is returned as an atom.
library(html_write)
described in section section 3.15.text/xml
.xml(XMLTerm)
, using the provided MIME type.Content-type
equals Type.application/x-www-form-urlencoded
as produced by browsers issuing a POST request from an HTML form.
ListOfParameter is a list of Name=Value
or
Name(Value) .multipart/form-data
as produced
by browsers issuing a POST request from an HTML form using enctype
multipart/form-data
. This is a somewhat simplified MIME
multipart/mixed
encoding used by browser forms including
file input fields. ListOfData is the same as for the List
alternative described below. Below is an example from the SWI-Prolog
Sesame interface. Repository,
etc. are atoms providing the value, while the last argument provides a
value from a file.
..., http_post([ protocol(http), host(Host), port(Port), path(ActionPath) ], form_data([ repository = Repository, dataFormat = DataFormat, baseURI = BaseURI, verifyData = Verify, data = file(File) ]), _Reply, []), ...,
multipart/mixed
and packed using mime_pack/3.
See
mime_pack/3
for details on the argument format.
This plug-in library library(http/http_mime_plugin)
breaks multipart documents that are recognised by the Content-Type:
multipart/form-data
or Mime-Version: 1.0
in the
header into a list of Name = Value pairs. This
library deals with data from web-forms using the multipart/form-data
encoding as well as the FIPA
agent-protocol messages.
This plug-in library library(http/http_sgml_plugin)
provides a bridge between the SGML/XML/HTML parser provided by library(sgml)
and the http client library. After loading this hook the following
mime-types are automatically handled by the SGML parser.
library(sgml)
using W3C HTML 4.0 DTD, suppressing
and ignoring all HTML syntax errors. Options is passed to
load_structure/3.library(sgml)
using dialect xmlns
(XML + namespaces).
Options is passed to load_structure/3.
In particular,
dialect(xml)
may be used to suppress namespace handling.library(sgml)
using dialect sgml
. Options
is passed to load_structure/3.
The HTTP server library consists of two parts obligatory and one
optional part. The first deals with connection management and has three
different implementation depending on the desired type of server. The
second implements a generic wrapper for decoding the HTTP request,
calling user code to handle the request and encode the answer. The
optional http_dispatch
module can be used to assign HTTP
locations (paths) to predicates. This design is summarised in
figure 1.
Figure 1 : Design of the HTTP server |
The functional body of the user's code is independent from the selected server-type, making it easy to switch between the supported server types.
The server-body is the code that handles the request and formulates a
reply. To facilitate all mentioned setups, the body is driven by
http_wrapper/5.
The goal is called with the parsed request (see
section 3.8) as argument and current_output
set to a temporary buffer. Its task is closely related to the task of a
CGI script; it must write a header declaring holding at least the
Content-type
field and a body. Here is a simple body
writing the request as an HTML table.
reply(Request) :- format('Content-type: text/html~n~n', []), format('<html>~n', []), format('<table border=1>~n'), print_request(Request), format('~n</table>~n'), format('</html>~n', []). print_request([]). print_request([H|T]) :- H =.. [Name, Value], format('<tr><td>~w<td>~w~n', [Name, Value]), print_request(T).
The infrastructure recognises the header
Transfer-encoding: chunked, causing it to use chunked
encoding if the client allows for it. See also section
4 and the
chunked
option in http_handler/3.
Other header lines are passed verbatim to the client. Typical examples
are Set-Cookie and authentication headers (see section
3.5.
Besides returning a page by writing it to the current output stream,
the server goal can raise an exception using throw/1
to generate special pages such as not_found
, moved
,
etc. The defined exceptions are:
http_reply(Reply,[])
.http_reply(not_modified,[])
. This exception
is for backward compatibility and can be used by the server to indicate
the referenced resource has not been modified since it was requested
last time.
This module can be placed between http_wrapper.pl
and
the application code to associate HTTP locations to predicates
that serve the pages. In addition, it associates parameters with
locations that deal with timeout handling and user authentication. The
typical setup is:
server(Port, Options) :- http_server(http_dispatch, [ port(Port), | Options ]). :- http_handler('/index.html', write_index, []). write_index(Request) :- ...
http_path.pl
. If an HTTP
request arrives at the server that matches Path, Closure
is called with one extra argument: the parsed HTTP request.
Options is a list containing the following options:
http_authenticate.pl
provides a plugin for user/password based Basic
HTTP
authentication.
Transfer-encoding: chunked
if the client allows for it.
true
on a prefix-handler (see prefix), possible children
are masked. This can be used to (temporary) overrule part of the tree.
:- http_handler(/, http_404([index('index.html')]), [spawn(my_pool),prefix]).
infinite
, default
or a positive number
(seconds)
Note that http_handler/3 is normally invoked as a directive and processed using term-expansion. Using term-expansion ensures proper update through make/0 when the specification is modified. We do not expand when the cross-referencer is running to ensure proper handling of the meta-call.
:
PredNametrue
(default), handle If-modified-since and send
modification time.
false
(default), validate that FileSpec does
not contain references to parent directories. E.g., specifications such
as www('../../etc/passwd')
are not allowed.
If caching is not disabled, it processed the request headers
If-modified-since
and Range
.
:- http_handler(root(.), http_redirect(moved, myapp('index.html')), []).
How | is one of moved , moved_temporary
or see_other |
To | is an atom, a aliased path as defined by http_absolute_location/3. or a term location_by_id(Id). If To is not absolute, it is resolved relative to the current location. |
This module provides a simple API to generate an index for a physical directory. The index can be customised by overruling the dirindex.css CSS file and by defining additional rules for icons using the hook http:file_extension_icon/2.
The calling conventions allows for direct calling from http_handler/3.
This library defines session management based on HTTP cookies. Session management is enabled simply by loading this module. Details can be modified using http_set_session_options/1. If sessions are enabled, http_session_id/1 produces the current session and http_session_assert/1 and friends maintain data about the session. If the session is reclaimed, all associated data is reclaimed too.
Begin and end of sessions can be monitored using library(broadcast). The broadcasted messages are:
For example, the following calls end_session(SessionId) whenever a session terminates. Please note that sessions ends are not scheduled to happen at the actual timeout moment of the session. Instead, creating a new session scans the active list for timed-out sessions. This may change in future versions of this library.
:- listen(http_session(end(SessionId, Peer)), end_session(SessionId)).
swipl_session
.
/
. Cookies are only sent if the HTTP request path is a
refinement of Path.
SessionId | is an atom. |
http_session_id
.
Using a backtrackable global variable is safe because continuous worker
threads use a failure driven loop and spawned threads start without any
global variables. This variable can be set from the commandline to fake
running a goal from the commandline in the context of a session.
http_session(end(SessionId, Peer))
The broadcast is done before the session data is destroyed and the listen-handlers are executed in context of the session that is being closed. Here is an example that destroys a Prolog thread that is associated to a thread:
:- listen(http_session(end(SessionId, _Peer)), kill_session_thread(SessionID)). kill_session_thread(SessionID) :- http_session_data(thread(ThreadID)), thread_signal(ThreadID, throw(session_closed)).
Succeed without any effect if SessionID does not refer to an active session.
The module http/http_authenticate
provides the basics to
validate an HTTP Authorization
error. User and password
information are read from a Unix/Apache compatible password file. This
information, as well as the validation process is cached to achieve
optimal performance.
Basic
authentication and verify the password from
PasswordFile. PasswordFile is a file holding usernames and passwords in
a format compatible to Unix and Apache. Each line is record with :
separated fields. The first field is the username and the second the
password _hash_. Password hashes are validated using crypt/2.
Successful authorization is cached for 60 seconds to avoid overhead of decoding and lookup of the user and password data.
http_authenticate/3 just validates the header. If authorization is not provided the browser must be challenged, in response to which it normally opens a user-password dialogue. Example code realising this is below. The exception causes the HTTP wrapper code to generate an HTTP 401 reply.
..., ( http_authenticate(basic(passwd), Request, User) -> true ; throw(http_reply(authorise(basic, Realm))) ).
Alternatively basic(+PasswordFile)
can be passed as an
option to
http_handler/3.
This library implements the OpenID protocol (http://openid.net/). OpenID is a protocol to share identities on the network. The protocol itself uses simple basic HTTP, adding reliability using digitally signed messages.
Steps, as seen from the consumer (or relying partner).
openid_identifier
openid_identifier
and lookup
<link rel="openid.server" href="server">
checkid_setup
,
asking to validate the given OpenID.
A consumer (an application that allows OpenID login) typically uses this library through openid_user/3. In addition, it must implement the hook http_openid:openid_hook(trusted(OpenId, Server)) to define accepted OpenID servers. Typically, this hook is used to provide a white-list of aceptable servers. Note that accepting any OpenID server is possible, but anyone on the internet can setup a dummy OpenID server that simply grants and signs every request. Here is an example:
:- multifile http_openid:openid_hook/1. http_openid:openid_hook(trusted(_, OpenIdServer)) :- ( trusted_server(OpenIdServer) -> true ; throw(http_reply(moved_temporary('/openid/trustedservers'))) ). trusted_server('http://www.myopenid.com/server').
By default, information who is logged on is maintained with the session using http_session_assert/1 with the term openid(Identity). The hooks login/logout/logged_in can be used to provide alternative administration of logged-in users (e.g., based on client-IP, using cookies, etc.).
To create a server, you must do four things: bind the handlers
openid_server/2 and openid_grant/1
to HTTP locations, provide a user-page for registered users and define
the grant(Request, Options) hook to verify your users. An example server
is provided in in
<plbase>/doc/packages/examples/demo_openid.pl
handler(Request) :- openid_user(Request, OpenID, []), ...
If the user is not yet logged on a sequence of redirects will follow:
Options:
/openid/login
.
//
http_dispatch.pl
.
The OpenId server will redirect to the openid.return_to URL.
OpenIDLogin | ID as typed by user (canonized) |
OpenID | ID as verified by server |
Server | URL of the OpenID server |
After openid_verify/2 has
redirected the browser to the OpenID server, and the OpenID
server did its magic, it redirects the browser back to this address. The
work is fairly trivial. If
mode
is cancel
, the OpenId server denied. If id_res
,
the OpenId server replied positive, but we must verify what the server
told us by checking the HMAC-SHA signature.
This call fails silently if their is no openid.mode
field in the request.
yes
, check the authority (typically the password) and if
all looks good redirect the browser to ReturnTo, adding the OpenID
properties needed by the Relying Party to verify the login.The library library(http/http_parameters)
provides two
predicates to fetch HTTP request parameters as a type-checked list
easily. The library transparently handles both GET and POST requests. It
builds on top of the low-level request representation described in
section 3.8.
error(existence_error(form_data, Name)
, _)
is
thrown. Options fall into three categories: those that handle presence
of the parameter, those that guide conversion and restrict types and
those that support automatic generation of documention. First, the
presence-options:
default
and optional
are
ignored and the value is returned as a list. Type checking options are
processed on each value.List(Type)
.
The type and conversion options are given below. The type-language can be extended by providing clauses for the multifile hook http:convert_parameter/3.
(nonneg;oneof([infinite]))
to
specify an integer or a symbolic value.The last set of options is to support automatic generation of HTTP
API documentation from the sources.2This
facility is under development in ClioPatria; see http_help.pl
.
Below is an example
reply(Request) :- http_parameters(Request, [ title(Title, [ optional(true) ]), name(Name, [ length >= 2 ]), age(Age, [ between(0, 150) ]) ]), ...
Same as http_parameters(Request, Parameters,[])
call(Goal, +ParamName, -Options)
to find the options.
Intended to share declarations over many calls to http_parameters/3.
Using this construct the above can be written as below.
reply(Request) :- http_parameters(Request, [ title(Title), name(Name), age(Age) ], [ attribute_declarations(param) ]), ... param(title, [optional(true)]). param(name, [length >= 2 ]). param(age, [integer]).
The body-code (see section 3.1) is
driven by a Request. This request is generated from http_read_request/2
defined in
library(http/http_header)
.
Name(Value)
elements. It provides a number of predefined elements for the result of
parsing the first line of the request, followed by the additional
request parameters. The predefined fields are:
Host:
Host, Host is
unified with the host-name. If Host is of the format <host>:<port>
Host only describes <host> and a field port(Port)
where
Port is an integer is added.get
, put
or post
.
This field is present if the header has been parsed successfully.ip(A,B,C,D)
containing the IP
address of the contacting host.host
for details.?
,
normally used to transfer data from HTML forms that use the `GET
'
protocol. In the URL it consists of a www-form-encoded list of Name=Value
pairs. This is mapped to a list of Prolog Name=Value
terms with decoded names and values. This field is only present if the
location contains a search-specification.HTTP/
Major.Minor
version indicator this element indicate the HTTP version of the peer.
Otherwise this field is not present.Cookie
line, the value of the
cookie is broken down in Name=Value pairs, where
the
Name is the lowercase version of the cookie name as used for
the HTTP fields.SetCookie
line, the cookie field
is broken down into the Name of the cookie, the Value
and a list of Name=Value pairs for additional
options such as expire
, path
, domain
or secure
.
If the first line of the request is tagged with
HTTP/
Major.Minor, http_read_request/2
reads all input upto the first blank line. This header consists of
Name:Value fields. Each such field appears as a
term
Name(Value)
in the Request, where Name
is canonised for use with Prolog. Canonisation implies that the
Name is converted to lower case and all occurrences of the
-
are replaced by _
. The value for the
Content-length
fields is translated into an integer.
Here is an example:
?- http_read_request(user, X). |: GET /mydb?class=person HTTP/1.0 |: Host: gollem |: X = [ input(user), method(get), search([ class = person ]), path('/mydb'), http_version(1-0), host(gollem) ].
Where the HTTP GET
operation is intended to get a
document, using a path and possibly some additional search
information, the POST
operation is intended to hand
potentially large amounts of data to the server for processing.
The Request parameter above contains the term method(post)
.
The data posted is left on the input stream that is available through
the term input(Stream)
from the Request header.
This data can be read using http_read_data/3
from the HTTP client library. Here is a demo implementation simply
returning the parsed posted data as plain text (assuming pp/1
pretty-prints the data).
reply(Request) :- member(method(post), Request), !, http_read_data(Request, Data, []), format('Content-type: text/plain~n~n', []), pp(Data).
If the POST is initiated from a browser, content-type is generally
either application/x-www-form-urlencoded
or
multipart/form-data
. The latter is broken down
automatically if the plug-in library(http/http_mime_plugin)
is loaded.
The functionality of the server should be defined in one Prolog file (of course this file is allowed to load other files). Depending on the wanted server setup this `body' is wrapped into a small Prolog file combining the body with the appropriate server interface. There are three supported server-setups. For most applications we advice the multi-threaded server. Examples of this server architecture are the PlDoc documentation system and the SeRQL Semantic Web server infrastructure.
All the server setups may be wrapped in a reverse proxy to make them available from the public web-server as described in section 3.9.7.
library(thread_httpd)
for a multi-threaded
serverThis server is harder to debug due to the involved threading, although the GUI tracer provides reasonable support for multi-threaded applications using the tspy/1 command. It can provide fast communication to multiple clients and can be used for more demanding servers.
library(xpce_httpd)
for an event-driven serverThis server setup is very suitable for debugging as well as embedded server in simple applications in a fairly controlled environment.
library(inetd_httpd)
for server-per-clientThis server is very hard to debug as the server is not connected to the user environment. It provides a robust implementation for servers that can be started quickly.
All the server interfaces provide http_server(:Goal, +Options)
to create the server. The list of options differ, but the servers share
common options:
The library(http/thread_httpd.pl)
provides the
infrastructure to manage multiple clients using a pool of worker-threads.
This realises a popular server design, also seen in Java Tomcat and
Microsoft .NET. As a single persistent server process maintains
communication to all clients startup time is not an important issue and
the server can easily maintain state-information for all clients.
In addition to the functionality provided by the other (XPCE and
inetd) servers, the threaded server can also be used to realise an HTTPS
server exploiting the library(ssl)
library. See option
ssl(+SSLOptions)
below.
port(?Port)
option to specify the port the server should listen to. If Port
is unbound an arbitrary free port is selected and Port is
unified to this port-number. The server consists of a small Prolog
thread accepting new connection on Port and dispatching these
to a pool of workers. Defined Options are:
infinite
,
making each worker wait forever for a request to complete. Without a
timeout, a worker may wait forever on an a client that doesn't complete
its request.https://
protocol. SSL
allows for encrypted communication to avoid others from tapping the wire
as well as improved authentication of client and server. The SSLOptions
option list is passed to ssl_init/3.
The port option of the main option list is forwarded to the SSL layer.
See the library(ssl)
library for details.
This can be used to tune the number of workers for performance. Another possible application is to reduce the pool to one worker to facilitate easier debugging.
pool(Pool)
or to thread_create/3
of the pool option is not present. If the dispatch module is used (see section
3.2), spawning is normally specified as an option to the http_handler/3
registration.
We recomment the use of thread pools. They allow registration of a set of threads using common characteristics, specify how many can be active and what to do if all threads are active. A typical application may define a small pool of threads with large stacks for computation intensive tasks, and a large pool of threads with small stacks to serve media. The declaration could be the one below, allowing for max 3 concurrent solvers and a maximum backlog of 5 and 30 tasks creating image thumbnails.
:- use_module(library(thread_pool)). :- thread_pool_create(compute, 3, [ local(20000), global(100000), trail(50000), backlog(5) ]). :- thread_pool_create(media, 30, [ local(100), global(100), trail(100), backlog(100) ]). :- http_handler('/solve', solve, [spawn(compute)]). :- http_handler('/thumbnail', thumbnail, [spawn(media)]).
The library(http/xpce_httpd.pl)
provides the
infrastructure to manage multiple clients with an event-driven
control-structure. This version can be started from an interactive
Prolog session, providing a comfortable infra-structure to debug the
body of your server. It also allows the combination of an (XPCE-based)
GUI with web-technology in one application.
port(?Port)
option to specify the port the
server should listen to. If Port is unbound an arbitrary free
port is selected and Port is unified to this port-number.
Currently no options are defined.
The file demo_xpce
gives a typical example of this
wrapper, assuming demo_body
defines the predicate reply/1.
:- use_module(xpce_httpd). :- use_module(demo_body). server(Port) :- http_server(reply, Port, []).
The created server opens a server socket at the selected address and waits for incoming connections. On each accepted connection it collects input until an HTTP request is complete. Then it opens an input stream on the collected data and using the output stream directed to the XPCE socket it calls http_wrapper/5. This approach is fundamentally different compared to the other approaches:
All modern Unix systems handle a large number of the services they
run through the super-server inetd. This program reads
/etc/inetd.conf
and opens server-sockets on all ports
defined in this file. As a request comes in it accepts it and starts the
associated server such that standard I/O refers to the socket. This
approach has several advantages:
The very small generic script for handling inetd based connections is
in inetd_httpd
, defining http_server/1:
Here is the example from demo_inetd
#!/usr/bin/pl -t main -q -f :- use_module(demo_body). :- use_module(inetd_httpd). main :- http_server(reply).
With the above file installed in /home/jan/plhttp/demo_inetd
,
the following line in /etc/inetd
enables the server at port
4001 guarded by tcpwrappers. After modifying inetd, send the
daemon the HUP
signal to make it reload its configuration.
For more information, please check inetd.conf(5).
4001 stream tcp nowait nobody /usr/sbin/tcpd /home/jan/plhttp/demo_inetd
There are rumours that inetd has been ported to Windows.
To be done.
There are three options for public deployment of a service. One is to run it on a dedicated machine on port 80, the standard HTTP port. The machine may be a virtual machine running ---for example--- under VMWARE or XEN. The (virtual) machine approach isolates security threads and allows for using a standard port. The server can also be hosted on a non-standard port such as 8000, or 8080. Using non-standard ports however may cause problems with intermediate proxy- and/or firewall policies. Isolation can be achieved using a Unix chroot environment. Another option, also recommended for Tomcat servers, is the use of Apache reverse proxies. This causes the main web-server to relay requests below a given URL location to our Prolog based server. This approach has several advantages:
Note that the proxy technology can be combined with isolation methods such as dedicated machines, virtual machines and chroot jails. The proxy can also provide load balancing.
Setting up a reverse proxy
The Apache reverse proxy setup is really simple. Ensure the modules
proxy
and proxy_http
are loaded. Then add two
simple rules to the server configuration. Below is an example that makes
a PlDoc server on port 4000 available from the main Apache server at
port 80.
ProxyPass /pldoc/ http://localhost:4000/pldoc/ ProxyPassReverse /pldoc/ http://localhost:4000/pldoc/
Apache rewrites the HTTP headers passing by, but using the above
rules it does not examine the content. This implies that URLs embedded
in the (HTML) content must use relative addressing. If the locations on
the public and Prolog server are the same (as in the example above) it
is allowed to use absolute locations. I.e. /pldoc/search
is
ok, but http://myhost.com:4000/pldoc/search
is not.
If the locations on the server differ, locations must be relative (i.e. not
start with /
.
This problem can also be solved using the contributed Apache module
proxy_html
that can be instructed to rewrite URLs embedded
in HTML documents. In our experience, this is not troublefree as URLs
can appear in many places in generated documents. JavaScript can create
URLs on the fly, which makes rewriting virtually impossible.
The body is called by the module library(http/http_wrapper.pl)
.
This module realises the communication between the I/O streams and the
body described in section 3.1. The
interface is realised by
http_wrapper/5:
'Keep-alive'
if both ends of the connection want to
continue the connection or close
if either side wishes to
close the connection.
This predicate reads an HTTP request-header from In,
redirects current output to a memory file and then runs call(Goal,
Request)
, watching for exceptions and failure. If Goal
executes successfully it generates a complete reply from the created
output. Otherwise it generates an HTTP server error with additional
context information derived from the exception.
http_wrapper/5 supports the following options:
..., format('Set-Cookie: ~w=~w; path=~w~n', [Cookie, SessionID, Path]), ...,
If ---for whatever reason--- the conversion is not possible it simply unifies RelPath to AbsPath.
This library finds the public address of the running server. This can
be used to construct URLs that are visible from anywhere on the
internet. This module was introduced to deal with OpenID, where a reques
is redirected to the OpenID server, which in turn redirects to our
server (see http_openid.pl
).
The address is established from the settings http:public_host and http:public_port if provided. Otherwise it is deduced from the request.
true
(default false
), try to replace a
local hostname by a world-wide accessible name.
Simple module for logging HTTP requests to a file. Logging is enabled
by loading this file and ensure the setting http:logfile is not the
empty atom. The default file for writing the log is httpd.log
.
See library(settings) for details.
The level of logging can modified using the multifile predicate
http_log:nolog/1 to hide HTTP request
fields from the logfile and
http_log:password_field/1 to hide
passwords from HTTP search specifications (e.g. /topsecret?password=secret
).
The library library(http/http_error.pl)
defines a hook
that decorates uncaught exceptions with a stack-trace. This will
generate a 500 internal server error document with a
stack-trace. To enable this feature, simply load this library. Please do
note that providing error information to the user simplifies the job of
a hacker trying to compromise your server. It is therefore not
recommended to load this file by default.
The example program calc.pl
has the error handler loaded
which can be triggered by forcing a divide-by-zero in the calculator.
The library library(http/http_header)
provides
primitives for parsing and composing HTTP headers. Its functionality is
normally hidden by the other parts of the HTTP server and client
libraries. We provide a brief overview of http_reply/3
which can be accessed from the reply body using an exception as explain
in section 3.1.1.
Field(Value)
. Type
is one of:
library(http/html_write)
described in section
3.15.File(+MimeType, +Path)
, but do not include a
modification time header.stream(+Stream, +Len)
, but the data on Stream
must contain an HTTP header.library(http/html_write)
libraryProducing output for the web in the form of an HTML document is a requirement for many Prolog programs. Just using format/2 is satisfactory as it leads to poorly readable programs generating poor HTML. This library is based on using DCG rules.
The library(http/html_write)
structures the generation
of HTML from a program. It is an extensible library, providing a DCG
framework for generating legal HTML under (Prolog) program control. It
is especially useful for the generation of structured pages (e.g. tables)
from Prolog data structures.
The normal way to use this library is through the DCG html//1. This non-terminal provides the central translation from a structured term with embedded calls to additional translation rules to a list of atoms that can then be printed using print_html/[1,2].
//
[]
\
List
\
Term
\
Term but allows for invoking grammar rules in
external packages.
Tag(Content)
Tag(Attributes, Content)
Name(Value)
or
Name=Value. Value is the atomic
attribute value but allows for a limited functional notation:
encode(Atom)
location_by_id(ID)
Name(Value)
. Values are encoded as in the encode option
described above.
NAMES
). Each value
in list is separated by a space. This is particularly useful for setting
multiple class
attributes on an element. For example:
... span(class([c1,c2]), ...),
The example below generates a URL that references the predicate
set_lang/1
in the application with given parameters. The http_handler/3
declaration binds /setlang
to the predicate set_lang/1
for which we provide a very simple implementation. The code between ...
is part of an HTML page showing the english flag which, when pressed,
calls set_lang(Request)
where Request contains
the search parameter lang
= en
. Note that the
HTTP location (path) /setlang
can be moved without
affecting this code.
:- http_handler('/setlang', set_lang, []). set_lang(Request) :- http_parameters(Request, [ lang(Lang, []) ]), http_session_retractall(lang(_)), http_session_assert(lang(Lang)), reply_html_page(title('Switched language'), p(['Switch language to ', Lang])). ... html(a(href(location_by_id(set_lang) + [lang(en)]), img(src('/www/images/flags/en.png')))), ...
//
DOCTYPE
declaration. HeadContent are elements to
be placed in the head
element and BodyContent
are elements to be placed in the body
element.
To achieve common style (background, page header and footer), it is
possible to define DCG non-terminals head//1 and/or body//1.
Non-terminal page//1 checks for the definition of these non-terminals in
the module it is called from as well as in the user
module.
If no definition is found, it creates a head with only the HeadContent
(note that the
title
is obligatory) and a body
with bgcolor
set to white
and the provided BodyContent.
Note that further customisation is easily achieved using html//1 directly as page//2 is (besides handling the hooks) defined as:
page(Head, Body) --> html([ \['<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 4.0//EN">\n'], html([ head(Head), body(bgcolor(white), Body) ]) ]).
//
DOCTYPE
and the HTML
element. Contents is used to generate both the head and body
of the page.//
html_begin(table) html_begin(table(border(2), align(center)))
This predicate provides an alternative to using the
\
Command syntax in the html//1 specification.
The following two fragments are the same. The preferred solution depends
on your preferences as well as whether the specification is generated or
entered by the programmer.
table(Rows) --> html(table([border(1), align(center), width('80%')], [ \table_header, \table_rows(Rows) ])). % or table(Rows) --> html_begin(table(border(1), align(center), width('80%'))), table_header, table_rows, html_end(table).
//
The non-terminal html//1 translates a specification into a list of
atoms and layout instructions. Currently the layout instructions are
terms of the format nl(N)
, requesting at least N
newlines. Multiple consecutive nl(1)
terms are combined to
an atom containing the maximum of the requested number of newline
characters.
To simplify handing the data to a client or storing it into a file, the following predicates are available from this library:
reply_html_page(default, Head, Body)
.library(http_wrapper)
(CGI-style). Here is a simple typical example:
reply(Request) :- reply_html_page(title('Welcome'), [ h1('Welcome'), p('Welcome to our ...') ]).
The header and footer of the page can be hooked using the
grammar-rules user:head//2 and user:body//2. The first argument passed
to these hooks is the Style argument of reply_html_page/3
and the second is the 2nd (for head//2) or 3rd (for body//2) argument of reply_html_page/3.
These hooks can be used to restyle the page, typically by embedding the
real body content in a div
. E.g., the following code
provides a menu on top of each page of that is identified using the
style
myapp.
:- multifile user:body//2. user:body(myapp, Body) --> html(body([ div(id(top), \application_menu), div(id(content), Body) ])).
Redefining the head
can be used to pull in scripts, but
typically html_requires//1 provides a more modular approach for pulling
scripts and CSS-files.
Content-length
field of an HTTP reply-header.
Modern HTML commonly uses CSS and Javascript. This requires <link> elements in the HTML <head> element or <script> elements in the <body>. Unfortunately this seriously harms re-using HTML DCG rules as components as each of these components may rely on their own style sheets or JavaScript code. We added a `mailing' system to reposition and collect fragments of HTML. This is implemented by html_post/4, html_receive/3 and html_receive/4.
//
\
-commands are executed by mailman/1
from print_html/1 or html_print_length/2.
These commands are called in the calling context of the html_post/4
call.
A typical usage scenario is to get required CSS links in the document head in a reusable fashion. First, we define css/3 as:
css(URL) --> html_post(css, link([ type('text/css'), rel('stylesheet'), href(URL) ])).
Next we insert the unique CSS links, in the pagehead using the following call to reply_html_page/2:
reply_html_page([ title(...), \html_receive(css) ], ...)
//
//
Typically, Handler collects the posted terms, creating a term suitable for html/3 and finally calls html/3.
The library predefines the receiver channel head
at the
end of the
head
element for all pages that write the html head
through this library. The following code can be used anywhere inside an
HTML generating rule to demand a javascript in the header:
js_script(URL) --> html_post(head, script([ src(URL), type('text/javascript') ], [])).
This mechanism is also exploited to add XML namespace (xmlns
)
declarations to the (outer) html
element using xhml_ns/4:
//
xmlns
channel. Rdfa (http://www.w3.org/2006/07/SWD/RDFa/syntax/),
embedding RDF in (x)html provides a typical usage scenario where we want
to publish the required namespaces in the header. We can define:
rdf_ns(Id) --> { rdf_global_id(Id:'', Value) }, xhtml_ns(Id, Value).
After which we can use rdf_ns/3 as a
normal rule in html/3 to publish
namespaces from library(semweb/rdf_db)
. Note that this
macro only has effect if the dialect is set to xhtml
. In
html
mode it is silently ignored.
The required xmlns
receiver is installed by html_begin/3
using the html
tag and thus is present in any document that
opens the outer html
environment through this library.
In some cases it is practical to extend the translations imposed by
html//1. When using XPCE for example, it is comfortable to be able
defining default translation to HTML for objects. We also used this
technique to define translation rules for the output of the SWI-Prolog
library(sgml)
package.
The html//1 non-terminal first calls the multifile ruleset html_write:expand//1.
//
//
<&>
.//
<&>"
.
Though not strictly necessary, the library attempts to generate reasonable layout in SGML output. It does this only by inserting newlines before and after tags. It does this on the basis of the multifile predicate html_write:layout/3
-
,
requesting the output generator to omit the close-tag altogether or empty
,
telling the library that the element has declared empty content. In this
case the close-tag is not emitted either, but in addition html//1
interprets Arg in Tag(Arg)
as a list of
attributes rather than the content.
A tag that does not appear in this table is emitted without additional layout. See also print_html/[1,2]. Please consult the library source for examples.
In the following example we will generate a table of Prolog predicates we find from the SWI-Prolog help system based on a keyword. The primary database is defined by the predicate predicate/5 We will make hyperlinks for the predicates pointing to their documentation.
html_apropos(Kwd) :- findall(Pred, apropos_predicate(Kwd, Pred), Matches), phrase(apropos_page(Kwd, Matches), Tokens), print_html(Tokens). % emit page with title, header and table of matches apropos_page(Kwd, Matches) --> page([ title(['Predicates for ', Kwd]) ], [ h2(align(center), ['Predicates for ', Kwd]), table([ align(center), border(1), width('80%') ], [ tr([ th('Predicate'), th('Summary') ]) | \apropos_rows(Matches) ]) ]). % emit the rows for the body of the table. apropos_rows([]) --> []. apropos_rows([pred(Name, Arity, Summary)|T]) --> html([ tr([ td(\predref(Name/Arity)), td(em(Summary)) ]) ]), apropos_rows(T). % predref(Name/Arity) % % Emit Name/Arity as a hyperlink to % % /cgi-bin/plman?name=Name&arity=Arity % % we must do form-encoding for the name as it may contain illegal % characters. www_form_encode/2 is defined in library(url). predref(Name/Arity) --> { www_form_encode(Name, Encoded), sformat(Href, '/cgi-bin/plman?name=~w&arity=~w', [Encoded, Arity]) }, html(a(href(Href), [Name, /, Arity])). % Find predicates from a keyword. '$apropos_match' is an internal % undocumented predicate. apropos_predicate(Pattern, pred(Name, Arity, Summary)) :- predicate(Name, Arity, Summary, _, _), ( '$apropos_match'(Pattern, Name) -> true ; '$apropos_match'(Pattern, Summary) ).
library(http/html_write)
libraryThis library is the result of various attempts to reach at a more satisfactory and Prolog-minded way to produce HTML text from a program. We have been using Prolog for the generation of web pages in a number of projects. Just using format/2 never was a real option, generating error-prone HTML from clumsy syntax. We started with a layer on top of format/2, keeping track of the current nesting and thus always capable of properly closing the environment.
DCG based translation however naturally exploits Prolog's term-rewriting primitives. If generation fails for whatever reason it is easy to produce an alternative document (for example holding an error message).
The approach presented in this library has been used in combination
with
library(http/httpd)
in three projects: viewing RDF in a
browser, selecting fragments from an analysed document and presenting
parts of the XPCE documentation using a browser. It has proven to be
able to deal with generating pages quickly and comfortably.
In a future version we will probably define a goal_expansion/2
to do compile-time optimisation of the library. Quotation of known text
and invocation of sub-rules using the \
RuleSet
and
<Module>:<RuleSet> operators are
costly operations in the analysis that can be done at compile-time.
This library is a supplement to library(http/html_write) for producing JavaScript fragments. Its main role is to be able to call JavaScript functions with valid arguments constructed from Prolog data. E.g. suppose you want to call a JavaScript functions to process a list of names represented as Prolog atoms. This can be done using the call below, while without this library you would have to be careful to properly escape special characters.
numbers_script(Names) --> html(script(type('text/javascript'), [ \js_call('ProcessNumbers'(Names) ]),
The accepted arguments are described with js_args/3.
//
... html(script(type('text/javascript'), [ \js_call('x.y.z'(hello, 42) ]),
//
['var ', Id, ' = new ', \js_call(Term)]
//
null
This module provides an abstract specification of HTTP server locations that is inspired on absolute_file_name/3. The specification is done by adding rules to the dynamic multifile predicate http:location/3. The speficiation is very similar to user:file_search_path/2, but takes an additional argument with options. Currently only one option is defined:
The default priority is 0. Note however that notably libraries may decide to provide a fall-back using a negative priority. We suggest -100 for such cases.
This library predefines three locations at priority -100: The icons
and css
aliases are intended for images and css files and
are backed up by file a file-search-path that allows finding the icons
and css files that belong to the server infrastructure (e.g., http_dirindex/2).
http:prefix
Here is an example that binds /login
to login/1.
The user can reuse this application while moving all locations using a
new rule for the admin location with the option [priority(10)]
.
:- multifile http:location/3. :- dynamic http:location/3. http:location(admin, /, []). :- http_handler(admin(login), login, []). login(Request) :- ...
This library allows for abstract declaration of available CSS and
Javascript resources and their dependencies using html_resource/2.
Based on these declarations, html generating code can declare that it
depends on specific CSS or Javascript functionality, after which this
library ensures that the proper links appear in the HTML head. The
implementation is based on mail system implemented by html_post/2
of library html_write.pl
.
Declarations come in two forms. First of all http locations are
declared using the http_path.pl
library. Second, html_resource/2
specifies HTML resources to be used in the head
and their
dependencies. Resources are currently limited to Javascript files (.js)
and style sheets (.css). It is trivial to add support for other material
in the head. See
html_include/3.
For usage in HTML generation, there is the DCG rule html_requires/3 that demands named resources in the HTML head.
All calls to html_requires/3 for the page are collected and duplicates are removed. Next, the following steps are taken:
Use ?-
debug(html(script)). to see the requested and
final set of resources. All declared resources are in html_resource/3.
The edit/1 command recognises the names of
HTML resources.
true
(default false
), do not include About
itself, but only its dependencies. This allows for defining an alias for
one or more resources.
Registering the same About multiple times extends the properties defined for About. In particular, this allows for adding additional dependencies to a (virtual) resource.
//
head
using html_post/2.
The actual dependencies are computed during the HTML output phase by
html_insert_resource/3.
This module provides convience predicates to include PWP (Prolog Well-formed Pages) in a Prolog web-server. It provides the following predicates:
pwp_handler()
/
2reply_pwp_page()
/
3/web/pwp/
.
user:file_search_path(pwp, '/web/pwp'). :- http_handler(root(.), pwp_handler([path_alias(pwp)]), [prefix]).
Options include:
index.pwp
.
true
(default is false
), allow for
?view=source to serve PWP file as source.
Options supported are:
true
, (default false
), process the PWP file
in a module constructed from its canonical absolute path. Otherwise, the
PWP file is processed in the calling module.
Initial context:
get
, post
, put
or head
While processing the script, the file-search-path pwp includes the current location of the script. I.e., the following will find myprolog in the same directory as where the PWP file resides.
pwp:ask="ensure_loaded(pwp(myprolog))"
Writing servers is an inherently dangerous job that should be carried out with some considerations. You have basically started a program on a public terminal and invited strangers to use it. When using the interactive server or inetd based server the server runs under your privileges. Using CGI scripted it runs with the privileges of your web-server. Though it should not be possible to fatally compromise a Unix machine using user privileges, getting unconstrained access to the system is highly undesirable.
Symbolic languages have an additional handicap in their inherent possibilities to modify the running program and dynamically create goals (this also applies to the popular perl and java scripting languages). Here are some guidelines.
/etc/passwd
, but also ../../../../../etc/passwd
are tried by experienced hackers to learn about the system they want to
attack. So, expand provided names using absolute_file_name/[2,3]
and verify they are inside a folder reserved for the server. Avoid
symbolic links from this subtree to the outside world. The example below
checks validity of filenames. The first call ensures proper canonisation
of the paths to avoid an mismatch due to symbolic links or other
filesystem ambiguities.
check_file(File) :- absolute_file_name('/path/to/reserved/area', Reserved), absolute_file_name(File, Tried), atom_concat(Reserved, _, Tried).
open(pipe(Command), ...)
, verify the argument once more.
reply(Query) :- member(search(Args), Query), member(action=Action, Query), member(arg=Arg, Query), call(Action, Arg). % NEVER EVER DO THIS!
All your attacker has to do is specify Action as shell
and Arg as /bin/sh
and he has an uncontrolled
shell!
/
). This is
not a good idea. It is adviced to have all locations in a
server below a directory with an informative name. Consider to make the
root location something that can be changed using a global setting.
The
HTTP protocol provides for transfer encodings. These define
filters applied to the data described by the Content-type
.
The two most popular transfer encodings are chunked
and
deflate
. The chunked
encoding avoids the need
for a Content-length
header, sending the data in chunks,
each of which is preceded by a length. The deflate
encoding
provides compression.
Transfer-encodings are supported by filters defined as foreign
libraries that realise an encoding/decoding stream on top of another
stream. Currently there are two such libraries: library(http/http_chunked.pl)
and library(zlib.pl)
.
There is an emerging hook interface dealing with transfer encodings.
The
library(http/http_chunked.pl)
provides a hook used by
library(http/http_open.pl)
to support chunked encoding in http_open/3.
Note that both http_open.pl
and http_chunked.pl
must be loaded for http_open/3
to support chunked encoding.
library(http/http_chunked)
libraryFrom http://json.org, " JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language."
JSON is interesting to Prolog because using AJAX web technology we can easily created web-enabled user interfaces where we implement the server side using the SWI-Prolog HTTP services provided by this package. The interface consists of three libraries:
library(http/json)
provides support for the core JSON
object serialization.
library(http/json_convert)
converts between the primary
representation of JSON terms in Prolog and more application oriented
Prolog terms. E.g. point(X,Y) vs. object([x=X,y=Y]).
library(http/http_json)
hooks the conversion libraries
into the HTTP client and server libraries.
http_json.pl
links JSON to the HTTP client and server modules. json_convert.pl
converts JSON Prolog terms to more
comfortable terms.
This module supports reading and writing JSON objects. The canonical Prolog representation for a JSON value is defined as:
true
and false
are
mapped -like JPL- to @(true) and @(false).
null
is mapped to the Prolog term
@(null)
Here is a complete example in JSON and its corresponding Prolog term.
{ "name":"Demo term", "created": { "day":null, "month":"December", "year":2007 }, "confirmed":true, "members":[1,2,3] }
json([ name='Demo term', created=json([day= @null, month='December', year=2007]), confirmed= @true, members=[1, 2, 3] ])
atom
, string
or codes
.null
. Default
@(null)
true
. Default
@(true)
false
. Default
@(false)
atom
.
The alternative is string
, producing a packed string
object. Please note that codes
or
chars
would produce ambiguous output and is therefore not
supported.
If json_read/3 encounters end-of-file before any real data it binds Term to the term @(end_of_file).
true
, false
and null
constants.
null
.
Conversion to Prolog could translate @null into a variable if the
desired type is not any
. Conversion to JSON could map
variables to null
, though this may be unsafe. If the Prolog
term is known to be non-ground and JSON @null is a sensible mapping, we
can also use this simple snipit to deal with that fact.
term_variables(Term, Vars), maplist(=(@null), Vars).
The idea behind this module is to provide a flexible high-level
mapping between Prolog terms as you would like to see them in your
application and the standard representation of a JSON object as a Prolog
term. For example, an X-Y point may be represented in JSON as {"x":25, "y":50}
.
Represented in Prolog this becomes json([x=25,y=50]), but this is a
pretty non-natural representation from the Prolog point of view.
This module allows for defining records (just like library(record)
)
that provide transparent two-way transformation between the two
representations.
:- json_object point(x:integer, y:integer).
This declaration causes prolog_to_json/2 to translate the native Prolog representation into a JSON Term:
?- prolog_to_json(point(25,50), X). X = json([x=25, y=50])
A json_object/1 declaration
can define multiple objects separated by a comma (,), similar to the dynamic/1
directive. Optionally, a declaration can be qualified using a module.
The converstion predicates
prolog_to_json/2 and json_to_prolog/2
first try a conversion associated with the calling module. If not
successful, they try conversions associated with the module user
.
JSON objects have no type. This can be solved by adding an
extra field to the JSON object, e.g. {"type":"point", "x":25, "y":50}
.
As Prolog records are typed by their functor we need some notation to
handle this gracefully. This is achieved by adding +Fields to the
declaration. I.e.
:- json_object point(x:integer, y:integer) + [type=point].
Using this declaration, the conversion becomes:
?- prolog_to_json(point(25,50), X). X = json([x=25, y=50, type=point])
The predicate json_to_prolog/2 is often used after http_read_json/2 and prolog_to_json/2 before reply_json/1. For now we consider them seperate predicates because the transformation may be too general, too slow or not needed for dedicated applications. Using a seperate step also simplifies debugging this rather complicated process.
library(record)
. E.g.
?- json_object point(x:int, y:int, z:int=0).
:-
json_object/1
declarations. If a json_object/1
declaration declares a field of type
boolean
, commonly used thruth-values in Prolog are
converted to JSON booleans. Boolean translation accepts one of true
,
on
, 1
, @true, false
, fail
, off
or 0
, @false.
:-
json_object/1
declarations. An efficient transformation is non-trivial, but we rely on
the assumption that, although the order of fields in JSON
terms is irrelevant and can therefore vary a lot, practical applications
will normally generate the JSON objects in a consistent
order.
If a field in a json_object is declared of type boolean
,
@true and @false are translated to true
or false
,
the most commonly used Prolog representation for truth-values.
json.pl
describes how JSON objects are represented in
Prolog terms. json_convert.pl
converts between more natural Prolog
terms and json terms.
This module inserts the JSON parser for documents of MIME type
application/jsonrequest
and application/json
requested through the http_client.pl
library.
Typically JSON is used by Prolog HTTP servers. Below is a skeleton for handling a JSON request, answering in JSON.
handle(Request) :- http_read_json(Request, JSONIn), json_to_prolog(JSONIn, PrologIn), <compute>(PrologIn, PrologOut), % application body prolog_to_json(PrologOut, JSONOut), reply_json(JSONOut).
This module also integrates JSON support into the http client
provided by http_client.pl
. Posting a JSON query and
processing the JSON reply (or any other reply understood by http_read_data/3)
is as simple as below, where Term is a JSON term as described in json.pl
and reply is of the same format if the server replies with JSON.
..., http_post(URL, json(Term), Reply, [])
The SWI-Prolog HTTP library is in active use in a large number of projects. It is considered one of the SWI-Prolog core libraries that is actively maintained and regularly extended with new features. This is particularly true for the multi-threaded server. The XPCE and inetd based servers are not widely used.
This library is by no means complete and you are free to extend it.