- lookup, to return the type of web
page requested from the client browers, such as html,
css, gif, ico, jpg, js (javascript), php, and
png
- parse, to extract the absolute
path and query from the client web browser's web page
request line, and return standard web error messages
if the request had an error
- load to read all bytes from a web
page if html, dynamically allocate memory on the heap,
and save a pointer to the web page content and length,
or if a php web page request, invoke the php
interpreter to run the requested php script, and
return in content what the php script returned and
it's length
- indexes, which returns a
"path/to/a/directory" with either the html or php file
extension on the end of it, if it actually
exists.
- This project helped me learn
to implement a web
server in a language I know very well, C, before
learning to do that in the more common way of
programming web servers these days in languages I do
not know as well.
- Future plans may be to expand it to work on a board
for the physical computing interface environment,
where the board responds to requests from the client
web browser requests.
- The socket communications between this server and
the client browser were done by the online course
people.
- The directory structure for this basic web server in
C was as follows, and I simply filled in the above
functions described in the file called
server.c.
- The directory contained these files for
getting started below.
Makefile
public directory
cat.html (which has an IMG tag whose src attribute
is cat.jpg, provided by class project, and we love
all friendly type animals!!
cat.jpg
favicon.ico
hello.html (has a form thats configured to submit via GET a text field called name to "hello.php"
hello.php (mostly HTML but inside its body is a bit
of PHP code to deal with html special characters
test
index.html
index.php
server.c (implements web server that knows how to serve static content (i.e., files ending in
.html , .jpg , et al.) and dynamic content (i.e., files ending in .php ).
- "Usage: server (port number) (path to root)"
To specify a (TCP) port number on which server
should listen for HTTP requests, include (port number) as a
command line argument.
If you do not specify a port number, the program will
default to port 8080.
The last command line argument to server should be
the PATH to your servers "root" (the directory from
which files will be served).
To Test:
Server is started with: "./server public"
Listening on port 8080 (should be output)
Using the public directory as servers root, the directory from
which files will be served.
Under the public dir is the "test/index.html" file.
You only specify directories or files underneath
the public directory after the localhost:8080 to
request web pages.
- Tests: in client browser while
this server is running on same machine:
- "http://localhost:8080/test/index.html" (starts a
video of singer)
- "http://localhost:8080/cat.jpg" (shows photo of
a happy cat, and we like all happy animals!)
- you should also see "GET /cat.jpg HTTP/1.1" in
your terminal window, which is the "request line"
that your browser sent to the server
- Below that you should see all of the headers
that your browser sent to server followed by
"HTTP/1.1 200 OK" which is the servers response
to the browser
- "http://localhost:8080/test/index.php" (should
do a dir listing) NOTE: still getting this to work!
- Another way to test it:
- Open up Chromes developer tools, per the
instructions at
"https://developer.chrome.com/devtools"
- Then, once open, click the tools Network
folder, and then, while holding down Shift, reload
the page.
- Not only should you see Happy Cat again. You
should also see the following in your terminal
window.
- "GET /cat.jpg HTTP/1.1"
- "HTTP/1.1 200 OK"
- You might also see the following.
- "GET /favicon.ico HTTP/1.1"
- "HTTP/1.1 200 OK"
- Whats happening is, by convention, a lot of
websites have in their root directory a
"favicon.ico" file, which is a tiny icon thats
meant to be displayed a browsers address bar or
folder. If you do see those lines in your terminal
window, that just means the browser (Chrome in
this example) is guessing that your server, too,
might have "favicon.ico" file, which it
does!
- A Walkthrough Demo type test:
- "http://localhost:8080/cat.html" (shows photo of
a happy cat, with a margin around him, unlike when
it was just the jpg, due to Chromes default CSS
properties)
- If you look at the Chrome developer tools
Network folder (possibly after reloading, if they
werent still open), you should see that Chrome
first requested "cat.html" followed by "cat.jpg" ,
since the latter, recall, was specified as the
value of that img elements src attribute that we
saw earlier in "cat.html".
- To confirm, take a look at the developer tools
Elements folder, wherein you will see a pretty
printed version of the HTML in
"cat.html". You can even change it but only
Chromes in memory copy thereof.
- To tinker with the developer tools Styles
folder, even though this page doesnt have any CSS
of its own, you can see and change (temporarily)
Chromes default CSS properties via that
folder.
- If you look at the pages source code (as via
the developer tools Elements folder), you will see
your name embedded within the HTML! By contrast,
files like "cat.jpg" and "cat.html" (and even
"hello.html" ) are "static" content, since they are
not dynamically generated.
- To test code via a command line rather than with
a browser, this is one technique.
- So lets show you one other technique.
- Open up a second terminal window and position it
alongside your first.
- In the first terminal window, execute:" ~/server public"
from within your own "~/workspace/webserver"
directory, if the server isnt already
running.
- Then, in the second terminal window, execute the
below. (Note the "http://" this time instead of
"https://" .) "curl dash_i http://localhost:8080/".
If you havent used curl before, it is a command
line program with which you can send HTTP requests
(and more) to a server in order to see its
responses. The "dash-i" flag tells curl to include
responses HTTP headers in the output. Odds are,
whilst debugging your server, you will find it more
convenient (and revealing!) to see all of that via
curl than by poking around Chromes developer
tools. Incidentally, take care not to request
"cat.jpg" (or any binary file) via curl , else
you will see quite a mess!
- "server.c" is a tour of what was
written by the online class people, and what I
wrote:
- In "server.c", only the lookup, parse, load, and
indexes functions were written by me, as described
in the previous section.
- Next I describe what was done by the online course
people in "server.c".
- Atop the file are a bunch of "feature test macro
requirements" that allow them to use certain functions
that are declared (conditionally) in the header files
further below.
- Defined next are a few constants that specify limits
on HTTP requests sizes. They (arbitrarily) based their
values on defaults used by Apache, a popular web
server. See
"http://httpd.apache.org/docs/2.2/mod/core.html".
- Defined next is BYTES , a constant the specifies how
many bytes we will eventually be reading into buffers at
a time.
- Next are a bunch of header files, followed by a
definition of BYTE , which we have indeed defined as an
8 bit char, followed by a bunch of
prototypes.
- Finally, just above main are a just a few global
variables.
- main: Atop main is an
initialization of what appears to be a global variable
called errno . In fact, errno is defined in "errno.h"
and is used by quite a few functions to indicate (via
an int ), in cases of error, precisely which error has
occurred. See man errno for more details.
- Shortly thereafter is a call to getopt , which is a
function declared in "unistd.h" that makes it easier
to parse command line arguments. See man 3 getopt if
curious.
- Notice how we use getopt (and some Boolean
expressions) to ensure that server is used
properly.
- Next notice the call to start (for which you may
have noticed a prototype earlier). More on that
later.
- Below that is a declaration of a struct sigaction
via which we will listen for SIGINT (i.e., "control c"),
calling handler (a function defined by us elsewhere in
"server.c" ) if heard.
- And then, after declaring some variables, main
enters an infinite while loop.
- Atop that loop, we first free any memory that might
have been allocated by a previous iteration of the
loop.
- We then check whether we have been "signalled" via
"control c" to stop the server.
- Thereafter, within an if statment, is a call to
"connected" , which returns true if a client (e.g., a
browser or even curl ) has connected to the
server.
- After that is a call to parse , which parses a
browsers HTTP request, storing its "absolute path"
and "query" inside of two arrays that are passed into
it by reference.
- Next is a bunch of code that decodes that path
(decoding any URL encoded characters like "%20" ) and
"resolves" the path to a local path, figuring out
exactly what file was requested on the server
itself.
- Below that, we ascertain whether that path leads to
a directory or to a file and handle the request
accordingly, ultimately calling list , interpret , or
transfer .
- For directories (that do not have an "index.php" or
"index.html" file inside them), we call list in order
to display the directorys contents.)
- For files ending in ".php" (whose "MIME type" is
"text/x_dash_php" ), we call interpret .
- For other (supported) files, we call
transfer.
- And that is it for main! Notice, though, that
throughout main are a few uses of continue , the
effect of which is to jump back to the start of that
infinite loop. Just before continue in some cases,
too, is a call to error (another function they wrote)
with an HTTP status code. Together, those lines allow
the server to handle and respond to errors just before
returning its attention to new requests.
- connected: connected is below main.
"memset()" function fills the first sizeof(client
socket address) bytes of the memory area pointed to by
(client socket address) with the constant byte
zero.
- accept: extracts the first
connection request on the queue of pending connections
for the listening socket (server), sockfd, creates a
new connected socket, and returns a new file
descriptor referring to that socket (for this client
it just connected to).
- error: error calls "reason" to
determine the reason for the failure of obtaining the
request for the client and places it in a phrase string.
It forms a template string and then renders the template
into a body string and its length. It then adds the
headers and responds with the error code, header,
body, and length to the client.
- freedir: This function exists
simply to facilitate freeing memory that is allocated
by a function called scandir that we call in
list.
- handler: This function (called
whenever a user hits "control c") essentially tells
main to call stop by setting signaled , a global
variable, to true .
- htmlspecialchars :This function,
named identically to that PHP function we saw earlier,
escapes characters "(e.g., < as < )" that might
otherwise "break" an HTML page. We call it from list ,
lest some file or directory we are listing have a
"dangerous" character in its name.
- indexes: I wrote this function.
It returns a "path/to/a/directory" with either the html
or php file extension on the end of it, if it actually
exists. The function, given a "/path/to/a/directory",
returns "/path/to/a/directory/index.php" if
"index.php" actually exists therein, or
"/path/to/a/directory/index.html" if "index.html"
actually exists therein, or NULL . In the first of
those cases, this function should dynamically allocate
memory on the heap for the returned string.
- interpret: This function enables
the server to interpret PHP files. It is a bit cryptic
at first glance, but in a nutshell, all we are doing,
upon receiving a request for, say, "hello.php" , is
executing a line like "QUERY_STRING='name=Alice'
REDIRECT_STATUS=200 SCRIPT_FILENAME=/path/to/public"
the effect of which is to pass the contents of
"hello.php" to PHPs interpreter "(i.e., php_cgi )",
with any HTTP parameters supplied via an "environment
variable" called QUERY_STRING. Via load (a function we
wrote), we then read the interpreters output into
memory (via load ). And then we respond to the browser
with (dynamically generated) output.
- popen: That function opens a
"pipe" to a process ( "php_cgi" in our case), which
provides us with a FILE pointer via which we can read
that processs standard output (as though it were an
actual file). You will notice how this function calls load ,
though, in order to read the PHP interpreters output
into memory.
- list: A function that generates a
directory listing. Notice how much code it takes to
generate HTML using C, thanks to requisite memory
management. (They pointed out here that with PHP this
part is easier).
- load: This is a function that I
wrote to read all bytes from a web page. If it was
html, it dynamically allocated memory on the heap, and
saved a pointer to the web page content and length, or
if a php web page request, invoke the php interpreter
to run the requested php script, and return in content
what the php script returned and it' length.
1. reads all available bytes from file.
2. stores those bytes contiguously in dynamically allocated memory on the heap.
3. stores the address of the first of those bytes in "*content".
4. stores the number of bytes in *length.
Note that content is a "pointer to a pointer"
"(i.e., BYTE** )", which means that you can
effectively "return" a "BYTE*" to whichever function
calls load by dereferencing content and storing the
address of a BYTE at "*content" . Meanwhile, length is
a pointer "(i.e., size_t* )", which you can also
dereference in order to "return" a "size_t" to
whichever function calls load by dereferencing
length and storing a number at "*length".
- lookup: This is a function I wrote.
It returns:
- "text/css" for any file whose path ends in
".css" (or any capitalization thereof)
- "text/html" for any file whose path ends in ".html"
(or any capitalization thereof)
- "image/gif" for any file whose path ends in
".gif" (or any capitalization thereof)
- "image/x_dash_icon" for any file whose path ends in
".ico" (or any capitalization thereof)
- "image/jpeg" (not image/jpg ) for any file whose
path ends in ".jpg" (or any capitalization
thereof),
- "text/javascript" for any file whose path ends
in ".js" (or any capitalization thereof)
- "text/x_dash_php" for any file whose path ends in
".php" (or any capitalization thereof)
- "image/png" for any file whose path ends in
".png" (or any capitalization thereof)
- or NULL otherwise.
- parse: This is a function that I
wrote to extract the absolute path and query from the
client web browsers web page request line, and return
standard web error messages if the request had an error.
The function parses (i.e., iterates over) the "line"
argument it is given, extracting its absolute path and
query and storing them at "abs_path" and "query",
respectively.
abs_path:("Per 3.1.1 of
http://tools.ietf.org/html/rfc7230
(http://tools.ietf.org/html/rfc7230)"), is a request
line is defined as method SP request target SP HTTP
version CRLF wherein SP represents a single space "(
)" and CRLF represents "\r\n" . None of method ,
request target , and HTTP version meanwhile, may
contain SP. (Per 5.3 of the same RFC),
request target, meanwhile, can take several forms,
the only one of which your server needs to support
is "absolute path [ '?' query ]" whereby
"absolute path" (which will not contain '?' ) must
start with '/' and might optionally be followed by a
'?' followed by a query ,which may not contain
double quotes. We had to ensure that request line
(which is passed into parse as line) is consistent
with these rules. If it is not, we responded to the
browser with "400 Bad Request" and returned false.
Even if request line is consistent with these rules,
if method is not GET, we responded to the browser
with "405 Method Not Allowed" and return false.
If request target does not begin with '/' , we
responded to the browser with "501 Not Implemented"
and return false. If request target contains a
double quote , we responded to the browser with
"400 Bad Request" and returned false. If HTTP
version is not "HTTP/1.1" , we responded to the
browser with "505 HTTP Version Not Supported" and
returned false. If all is well, we stored
"absolute path" at the address in "abs_path"
(which was also passed into parse as an
argument). We could assume that the memory to which
"abs_path" points was at least of length
"LimitRequestLine + 1".
query: We stored at the address in
query the query substring from request target. If
that substring was absent (even if a '?' is
present), then query should be 2 double quotes ,
thereby consuming one byte, whereby query[0] is
"\0". We could assume that the memory to which
query points was at least of length "LimitRequestLine + 1".
For instance, if request target is "/hello.php" or
"/hello.php?", then query should have a value of
double quotes . And if request target was
"/hello.php?q=Alice", then query had value of
"q=Alice".
- reason: This function simply
mapped HTTP "status codes" (e.g., 200 ) to "reason
phrases" (e.g., OK ).
- redirect: This function redirects
a client to another location (i.e., URL) by sending a
status code of 301 plus a Location header.
- request: When the server receives
a request from a client, the server does not know in
advance how many characters the request will
comprise. So this function iteratively reads bytes
from the client, one buffers worth at a time, calling
realloc as needed to store the entire message (i.e.,
request). Notice this functions use of pointers,
dynamic memory allocation, pointer arithmetic, and
more. Ultimately, it keeps reading bytes from the
client until it encounters "\r\n\r\n" (aka CRLF CRLF),
which, according to HTTPs spec, marks the end of a
requests headers. Note that read() is quite like
fread except that it reads from a "file descriptor"
(i.e., an int ) instead of from a FILE pointer "(i.e.,
FILE* )".
- respond: It is this function that
actually sends a client an HTTP response, given a
status code, head, body, and that bodys length.
- Know that dprintf is quite like printf (or, really,
fprintf ) except that the former, like read , writes
to a "file descriptor" instead of to a FILE*
.
- start: Start is the function that
configures the server to listen for connections on a
particular TCP port!
- stop: Stop does the opposite,
freeing all memory and ultimately compelling the
server to exit, without even returning control to
main.
- transfer() This functions purpose
in life is to transfer a file from the server to a
client. Whereas interpret handles dynamic content
(generated by PHP scripts), transfer handles static
content (e.g., JPEGs). Notice how this function calls
load in order to read some file from disk.
- urldecode() This function, also
named after a PHP function, URL decodes a string,
converting special characters like "%20" back to
their original values.
- Current Status of Web Server in C:
Works for HTML static page requests, but not for a
PHP request to return a directory, and the simple
Perl script mentioned below has not yet been
tried.
- Possible plans for expanding the web server
implemented in C described above (there are 2 parts to
it):
This part is similar to what I did in the online
class assignment: write a C/C++ program that implements a web
server. This web server will conform to "HTTP/1.x"
for the purposes of client requests, and it will
need to process client HTTP GET requests for web
pages hosted on the server machine. It will need
to use sockets to implement the communication
between a client on one machine and the server on
either the same or a remote machine.
- Add Interaction of physical computing with the web
server: The new part, described in more
detail below, will be adding functionality to the
web server to support
interaction with a physical device, a case where
computers interact with the physical world through a
collection of sensors and actuators. This forms a
physical computing environment.
- Web server Description:
The Basic HTTP Protocol: The basic structure of
interaction between a web client and web server is
as follows:
- Client sends request (from a suitable
browser)
- GET filename HTTP/version
- optional arguments
- a blank line
- Server sends reply
- "HTTP/version" status code status
message
- additional information
- a blank line
- content
- It will need to ensure the information sent
back from the server is formatted as described
above.
- The additional information sent back in a
server reply is of the form:
"Content-type:text/plain"
"text/html"
"image/gif"
"image/jpeg"
"xxx/yyy"
- The Server
- The Client
The client is any web browser of your
choice.
Requests from the client should be in the
form:
- "http://ip.address.of.server:port-number/request"
"ip.address.of.server" the IP address of
the server machine
"port-number" numeric port on which the
server listens.
Together with the IP address, this
identifies an end point of communication
(or socket) to which the client
connects.
request either a
subdirectory on the server that you wish to
list, the name of a html file, or a cgi
file. In the latter case, a reference to a
script on the server ia executed to perform
some command. The content of a cgi script,
such as "test.cgi", must be set executable
on the server and must refer to a shell or
Perl script such as the following:
"test.cgi" (set executable using chmod 755 "test.cgi"):
"#!/bin/sh"
"# test.cgi" a simple test
printf "Content-type: text/plain\n\nThis is a test!\n"
To execute a Perl script, you can issue a request such as:
"http://ip.address.of.server:port-number/request.cgi"
where "request.cgi" is an executable Perl
script on the server having contents such
as:
"#!/usr/bin/perl"
"# perl-test.cgi -- a simple Perl script
test"
print "Content-type: text/plain\n\nThis is a Perl test!\n;"
- Basic Test Cases:All of
the followiing test cases will be supported:
- A request for a directory
listing
- A request for a valid (and non
existing) html file. NOTE: A nonexistent
request corresponds to an HTTP error
status code of 404.
- A request for a static image (in either
gif or jpeg format, having a file ending
of .gif, .jpg or .jpeg)
- A request for a cgi script that
requires execution of a basic shell
command, executed using sh
- A request for a perl script in a cgi
file to process raw data and format it into
an html file.
- A request for a dynamically-created
image using gnuplot on the
server. Information about gnuplot can be
found at:
"http://www.gnuplot.info/"
- For the latter case, above, it is assumed
that the request specifies a cgi file
describing a perl script. The perl script
will process data as described in the next
subsection.
- Dynamic Content using Gnuplot: In this
case, a program will be executed
on the server called
"my histogram", as follows: "$my-histogram file pattern1 pattern2 ... patternN"
file specifies the name of a file you
wish to search for all occurrences of a
given regular expression pattern or
string sequence. For example:
"$my-histogram file 'and' 'but' 'so' 'he.*lo'"
will tally all occurrences of the words
"and", "but" and "so" in file,
along with all strings that match the
pattern "he.*lo" such as "hello" etc.
You can assume all regular expression
patterns that are acceptable
to grep "-e" are valid. You can assume
the number of pattern
arguments is limited to 5.
- Once "my histogram" has tallied all
occurrences of the matching strings for
each pattern, the results will be plotted
as a histogram using gnuplot. The output
of "my histogram" will be piped to
gnuplot using a Perl command as follows:
"open (GNUPLOT, '|gnuplot'); # Notice the vertical bar for a pipe"
After which piping commands to gnuplot is analogous to writing to a file.
"my histogram" will be written in C, but any
language can be used, Python,
Perl, etc. You are also free to use
shell commands such as "grep -e" if you
wish, or the built in Perl regular
expression features.
The output of gnuplot histogram will be formatted
to show "frequency" up
the y axis and the labelled patterns on
the x axis, so there is one frequency bar
per pattern.
- Next, gnuplot will be commanded to
output the histogram to a file that
records the information in gif or jpeg
format.
- After this, the cgi script will send
your gnuplot gif or jpeg image back to
the client for viewing.
- Pretty Printed Output
Just as this webpage has been formatted
using html, an executable on the server
will be invoked as part of your CGI
script to pretty print your
histogram.
- Specifically, the histogram image file
will be embedded in an HTML page that has
a 16pt RED font title and white
background.
The title should read: "My Webserver".
(I may experiment with the generated HTML
content, producing image backgrounds and
additional details. The base case will be
formatted as described,
however.)
- The title will be centered on the
page. Below it, will be a blank line
(spacing of which is your choosing)
followed by the histogram, which is also
centered.
- Advanced Features:
A multi threaded web
server: Instead of using
"fork()" calls for each client request,
instead spawn a thread using
my own thread creation routines,
based on the signaltstack() method.
Specifically,
"make/get/set/swapcontext()" functions
or any pre-existing thread packages
(e.g., pthreads) will NOT be
used. Instead, my own thread management
code.
- A web cache: I will
develop a method to cache files in RAM
for subsequent requests. The RAM cache
should be a pool of memory of some
defined size. This will to be a tunable
parameter from 4KB to 2MB.
- Upon initialization, the cache is
empty, but gets filled for each file
request until it is full. At that point
I will adopt a simple replacement
strategy my choice (e.g., first in
first out, random, or least recently
used).
- I will specify my replacement method in
a README file. To make the cache
beneficial, you should support requests
that are both in the server's filesystem
and also on a remote host.
- Client requests should provide an
optional argument to indicate the remote
host location for files that are not
stored in the server's local
filesystem.
- To test this feature, the server act
like a client for a remote host machine,
thereby retrieving the necessary file(s)
for placement in the web cache. In turn,
these files will be relayed back to the
original client.
- Server configuration.
For testing purposes, a way will be
provided to disable the above advanced
features, so that the web server falls
back to operating in normal mode (without
web caching and threads). This means I
will produce only one version of my
code. To enable or disable features of my
server, I will either use a configuration
script, pass in command line arguments,
or (worst case) use defined constants
within your code.
- Physical Computing
(NOTE: I may use what I learned in my
embedded systems online class projects
for this section, not sure yet, and that
is described separately).
- To tackle this part of the assignment
requires me to have access to an Arduino
Uno or similar Arduino compatible device.
(In my embedded systems class, we used a
TI Cortex M Arm based microcontroller
Launch board, but I may instead use an
Arduino). These can be purchased for
about "$5.99" (roughly the price of two
coffees) from places such as Microcenter.
You can also buy a good quality starter
kit from Amazon, which is a little more
expensive but includes everything to get
going with some basic building projects.
If I have access to a Raspberry Pi or
other similar single board computer, I
can use that too.
- The idea: as a way to do
physical computing, I will be creative
for this physical computing
section. One idea would be to have an
Arduino board connected via a serial
interface to a server PC running the
web server I have created.
- A client would initiate requests to my
web server, which would then issued
serial commands to my Arduino board.
My Arduino board would run some code (of
my choosing) to perform a control
operation.
- For example, a simple control operation
would be to turn on and off some LEDs, or
to control the speed and angle of a
stepper or servo motor. There are
numerous examples of doing this in my
embedded systems class labs, but I may do
something different here.
- All these operations would be triggered
by client requests via a web browser,
through which commands are sent via my
web server.
- Another option would be to simply use a
web interface to submit requests to
upload Arduino "sketches" to my target
device.
- Here, a sketch is a simple program that
runs out of the ROM space on the Arduino
itself.
- The Arduino IDE is open source and can
be downloaded for most OSes, and there is
a reference guide to explain the simple
programming language available
online.
- Why tackle the physical computing section?
You also get to work with physical
devices which is a lot of fun.
It makes computing feel "real" rather
than abstract. Writing programs
that make computers work is one thing,
but having those computers
control devices is the basis for how
robots, 3D printers, UAVs and all
sorts of physical devices
operate. This physical computing section
also allows me to be creative. I come up
with any physical computing problem of my
choosing, as long as the control is
initiated through the web
server.
- DEMO (if I ever finish, will post a video):
- Basic test cases for static content
(including directory listings,
correct error reporting, static
images and cgi scripts). This should
include correct usage of sockets
to enable client requests and correct
handling of HTML client request
formats
- Dynamic content including regular
expression handling and histogram and
gnuplot generation
- Pretty printed output, including
correct embedding of images in
HTML-formatted files
- Advanced featuresthread management
and the web cache (including proxy
support for remote
requests and cache
replacement)
- Server configuration
Program style and
documentation (in a README file)