Labnet Administrators Guide

'

Table of Contents

1Introduction 2

2System Configuration 3

2.1The Master Configuration Database 3

2.2The System Configuration Webtool 6

2.3Server Configuration Daemon 13

2.4Configuration File Management 15

2.5Webtool for Managing Files 20

2.6Writing Labnet File Templates 23

2.7Certificate Management 29

2.8Template Aware TFTP Daemon 30

2.9Configuring a Diskless Linux Client 30

3Image Management Software 32

3.1Replication of Master Images 32

3.2Windows XP Image Booting 38

3.3Windows 7 iSCSI Booting 40

3.3.1iSCSI Motivation 40

3.3.2iSCSI Implementation 40

3.3.3ISCSI Appserver Configuration 43

3.3.4ISCSI Appserver Boot Sequence 47

3.3.5Booting Windows 7 Client 51

3.4Creating Windows 7 iSCSI images 56

3.5Windows XP Imaging 68

3.6Building Servers 71

3.7Network Monitoring 84

4Managing the User's Data 88

4.1Labnet Software for User Support 88

4.2User Authentication 89

4.3Ldap Webtools 91

4.4User Backups 91

4.5The Backup Utility 95

4.6Labnet User Printing 99


Labnet Server and Desktop Management System

User Handbook

by

Michael Rayment

Labnet System Manager

Department of Computer Science


Abstract


Labnet is a computer management system developed at Memorial to manage about 1000 public access computers and 85 servers providing computing services for the academic community. Labnet has at its core a customizable database of system configurations both for servers and desktop clients. The database content is updated and managed by a web based GUI interface. Information from this database is then used to dynamically orchestrate a myriad of system configuration operations via Labnet configuration daemons running on each of the servers and desktop clients. This document will explore the underlying technologies behind the database and the daemons that govern the multifaceted aspects of system configuration including the desktop and server imaging and configuration file management as well as the user account and backup management. Virtually all aspects of the computers are controlled by the daemons and the information contained within the underlying database and software image repository. With Labnet, servers and desktop computers are simply centrally managed network appliances and as such, Labnet is a highly cost effective management strategy for enterprise level computer management.


1 Introduction


Labnet is a multifaceted software management system that performs desktop and server imaging and configuration as well as authentication, printer cost recovery and backups. All Labnet operations are centrally controlled by a database that is updated via a user friendly web based content management system. This same database acts as a repository for all computer specific configuration. Configuration changes to the database are automatically propagated to the desktop computers and servers immediately. Software for all Labnet servers running Linux and client computers running both Linux and Windows XP, is imaged from a master software repository under the direction of this database. Similarly all user data is backed up under the control of the system database. Every facet of our Labnet facility is managed and controlled by the master database.


Labnet currently manages around 85 servers, 100 cluster nodes and 900 desktop computers located in 40 labs across campus. The Labnet user community consists of 15000 students and 900 faculty and staff. Labnet currently offers 30 terabytes of on line data storage to its user community along with on line backups available 24/7 through a user friendly web interface. This data store is also accessible to users with non Labnet computers via Nomad.


Labnet has a ubiquitous presences on campus in student labs, the library, in student residences and in classrooms and, as such, is an ideal vehicle for delivering centralized computing services to the academic community.

Labnet centralized services that are currently supplied are:


  • Single sign on authentication (based on MUN's my.mun.ca account)

  • User data storage

  • Laptop access to data store

  • User backup retrieval

  • Client software imaging

  • Server software imaging

  • Printer cost recovery

  • Special printers for large format color charts

  • Software license managers

  • User web pages

  • Dual Operating system support - Linux and Microsoft


Clearly, Labnet has proven itself to be a viable management tool for administrating student labs across campus and the remainder of this paper will elaborate on the management software that was developed around a core database of configuration information that became Labnet. The discussion is broken down into three broad areas: system configuration, system imaging and user data management.


Of course there are situations that Labnet would not be appropriate. Labnet is best suited for situations where there are many homogenous processors that need to be managed remotely. In our situation this happens to be the case with our computer labs where a consistent environment is an asset. Another situation that would lend itself to Labnet software management would be the deployment of many autonomous processors, linked by Ethernet, used to do remote control and data gathering. The management of computers in a school district would be another application suited for Labnet.

2 System Configuration

2.1 The Master Configuration Database

The master configuration database was designed to be able to capture the differences among server and desktop computer configurations so that management of all Labnet computers could be centrally managed. Linux configuration files can be generalized into two basic structural formats, variable value pairs or tables. For simplicity our database design allows us to represent both configuration formats from a single MySQL database table of variable value pairs as defined below:










'ent_var_vals' table Definition:

Field

Type

Null

Key

Default

ent_id

bigint(20)

NO

PRI


var

varchar(90)

NO

PRI


indx

smallint(6)

NO

PRI


val

text

NO

NULL


mod_time

timestamp

NO

MUL

CURRENT_TIMESTAMP


The 'ent_id' associates this variable value pair with a particular instance of a configuration entity. It should be noted that an 'entity' is an instance of a collection of variable value pairs. It is analogous to a configuration file on a particular host. The 'var' field is the variable name and the 'val' field is the corresponding value. In the event that this variable can have more than one value, the the 'indx' field provides a way to provide an ordering on those values. A typical variable-value style configuration file is represented as a collection of these database records sharing a common 'ent_id'. Note that, if the variable is multivalued, then multiple records having a common 'ent_id' and 'var' fields would be present, one for each value. A typical table style configuration file is represented as a series of multivalued variables, one per column where the variable name is the column name and the index indicates the row number of the cell. For example the following table can be represented by the following series of database records:


Color

Mode

Red

Failed

Yellow

Warning

Green

Okay


Yields:


end_id

var

indx

val

23

Color

1

Red

23

Color

2

Yellow

23

Color

3

Green

23

Mode

1

Failed

23

Mode

2

Warning

23

Mode

3

Okay


Each row in the table represents a cell of the table where the 'ent_id' field associates the record with a particular entity instance and the columns are serially represented column by column where the 'var' field indicates the name of the column and the 'indx' field indicates the row number of this table cell element.


Clearly, the 'ent_id' is not a satisfying way to associate these values with a particular configuration entity. A second MySQL table, described as follows, is used to give a name to these 'entities' or collections of variable value pairs:


'entity' Table Definition:

Field

Type

Null

Key

Default

name

varchar(40)

NO

PRI


type

enum('client','server','standalone', ...)

NO

PRI


ent_id

bigint(20)

NO

MUL


status

enum('active','inactive')

NO


active


Where 'name' is the name associated with the collection of variable value pairs for all records having a matching 'ent_id'. There is a final table that is used by the web interface to manipulate and display the values in the database and the 'type' field is used to associate this entity with the collection of variables and their display characteristics found in the 'var_defs' table, defined as follows:


'var_defs' Table Definition:

Field

Type

Null

Key

Default

entity_type

enum('client','server','standalone','printer', … )

NO

PRI

NULL

name

varchar(90)

NO

PRI


view_type

enum('select','input','selectonly', … )

NO


NULL

link_type

enum('client','server','standalone', … )

YES


NULL

view_group

enum('uniq','Fstab','Rsync', … )

YES


NULL

attributes

set('mandatory','single','unique','table')

YES


NULL

regex

varchar(255)

YES


NULL

query

text

YES


NULL

indx

int(11)

YES


0

tooltip

varchar(2000)

YES


NULL

Where 'entity_type' matches the enumeration of 'type' in the 'entity' table above and 'name' is the name of the variable being defined. For each variable that can be a member of an entity of a given type, there is a corresponding record in this table that names the variable and provides attribute information about it. The next four fields (view_type, link_type, view_group, attributes) are used to control how the data appears on the screen (ie. is it a table or variable value pair, is it multivalued, etc.). The 'regex' field is a regular expression that limits the contents of the value to one that matches the regular expression. The 'query' field can be used to define an SQL query that enumerates the acceptable values that can be accepted as a valid value. The 'indx' field is used as a hint to position the variable on the web display. Finally the 'tooltip' field is used to specify what appears on the screen when there is a mouse over of this variable. Usually it contains hints as to what this variable should contain.


There are currently 60,000 variable-value pairs in the database, in 1700 different entities of 21 types. Some of the more salient entity types are as follows:


  1. client – describes the information that is specific to a diskless client computer

  2. server – describes the information specific to a managed server.

  3. printer – a collection of printer specific information

  4. standalone – basic information about computers in a Labnet vlan that are not fully managed

  5. services – information about the services that run on the servers

  6. rsyncImage – information about the different images

  7. fileMgr – information about the Labnet managed files


These entity types will be described in greater detail in the following discussions.

2.2 The System Configuration Webtool

A web based interface was developed to render the information in the master configuration database. The following screen shot shows the layout of the main features of this tool:




The most basic request would be to view an existing entity, which can easily be done using the commands in the Entity Maintenance box. Simply select the entity type from the drop down menu, enter the entity name and submit the request by clicking on the View Entity button. By selecting type 'standalone' and entering 'discovery' the following entity information is displayed:





This is a very simple entity consisting of 8 variables with their corresponding values plus a two column table with one column named cn_name and the other cn_domain. Note that the variables are divided into two categories, Mandatory and Optional, which indicates if a variable must have a value in order to produce a viable (active) entity. The following table indicates the records that exist in the ent_var_vals table corresponding to this display:


Ent_val_vals table entries for standalone entity 'discovery'

ent_id

var

indx

val

mod_time

1477

cn_domain

0

ichc2009.ca

2009-05-15 16:32:00

1477

cn_domain

1

ichc2009.ca

2009-05-15 16:32:00

1477

cn_name

0

mail

2009-05-15 16:32:00

1477

cn_name

1

www

2009-05-15 16:32:00

1477

domain

0

housing.mun.ca

2007-09-28 01:44:27

1477

ip

0

134.153.83.151

2007-08-23 23:48:21

1477

nms_contactgroup

0

housing_admins

2007-08-23 23:48:22

1477

room

0

UK0000

2007-08-23 23:48:21



Pressing the Use as Template button allows one to create a new entity using these values as the default values. To modify the data stored in an entity, one would simply choose the Modify Entity button instead of the View Entity button which would result in the following screen:



Note that there is a column of check boxes on the left. This column indicates whether or not the column has been modified. Each time one of the fields in the third column are changed, the check disappears, indicating that this field has changed. Clicking on a check box that is not checked will set it back to its original value. When all the values are satisfactory, one can click on Submit Changes to trigger an update to the database. Note that the input box style is not the same for all variables. There are sometimes drop down menus (server, nms_contactgroup), input boxes (mac_address), multivalued select boxes (cname) and a two column table with a row insert Commit button and Remove buttons in front of each row for deleting table rows. It is possible to modify individual table cells as well.


To add a new variable to this standalone entity type, click on the Main Menu button followed by the Variable Maintenance button located on the main page. The following entity type select page will be displayed:




In this case, we would select the 'standalone' entity type from the drop down menu and click on the Next button which would cause the following page to be displayed:




This table displays the variable definitions and their descriptive properties that can be used to control the way that variables are displayed and manipulated on the entity update page. By clicking on the variable names in the left column, one can edit existing variable parameters or, by clicking on the Add Variable button, a new variable can be defined. The following page would be displayed if the cn_domain button were selected:



If a new entity type is required or an old entity type requires removal, then clicking on the Entity Type Maintenance button from the main page would display the following page:




Creating a new entity can be done by.clicking on the Entity Creation button from the main page.


The Entity Search button on the main page is very useful for browsing and searching for various entities based on a variety of search criteria. This feature can also be used to perform batch edit changes and additions to the database based on the results of the variable search. Clicking it will bring you to the following entity type select page:




For this demonstration, let us assume that we are bringing up a new DHCP server called 'nermal' and that we would like to have all the standalone entities that belonged to the cs.mun.ca domain located in room EN1018 to get their DHCP information from this new server. The 'server' variable in the standalone entity controls what server will manage this client, so we must set the 'server' variable to 'nermal for all the standalone entities that match the search criteria. To do this, first we need to get to the page that sets up the search by selecting standalone from the drop down menu and clicking on the Variable Search button. The following page will be displayed:





Note that there is a column of check boxes on the left. By checking one or more of these boxes, we select the list of variables that can be displayed and edited in batch edit mode. For this example, we need to click on the check box along side the 'server' variable. Next, we need to set up the search criteria by selecting the values that must match from the various input options on the right hand column. Note that the matching search results must have all the selected values, ie. this is an AND database operation. At present, we have no way to do an OR select operation. For this example, we need to select the 'cs.mun.ca' domain from the drop down menu to the right of the 'domain' variable and then we need to select room 'EN1018' from the drop down menu to the right of variable 'room'. All that remains is to click on the Search button at the top or bottom of the screen. In this example, the following page is displayed:




The check boxes on the left hand side of the table indicate which entities will take part in the batch edit, thus, if there are some entities that you want to exclude, simply click on the corresponding check box. The next column shows the list of entities that matched the search criteria. Note that each of these entities are buttons that you can click on to bring you to the entity update page for that entity. This is useful if you want to do manual changes on a single selected entity. The remaining columns are the corresponding values of the variables selected for editing. In this case, since we selected the 'server' variable, this is a list of all the corresponding 'server' variable values. Let's assume that we want to batch edit all the listed standalone entities, so we simply click on the Batch Edit button which, in this case, results in the following page:




In the above example, you will notice that the list of batch entities that will be affected by this operation is listed at the top, followed by the name of the entity type of these entities. The remaining entries are the list of variables that are to be changed. In this case, since we chose only one variable, 'server', then that is the only entry displayed. At this point, we either use the drop down menu to select the new value or type in a different value in the adjacent input box. If this were a multivalued variable then a multivalued edit box would be displayed.


Batch editing tables is currently a bit cumbersome and limited but can be done using the Table Search button instead of the Variable Search button on the Entity Search page shown above.

2.3 Server Configuration Daemon

Once the database of configuration information has been established, some method is needed to propagate this information to the client and server computers. Additionally this information must be presented in a way that is compatible with existing client and server applications and administrative operations. To accomplish this, each server runs a configuration daemon called 'lnsysconfigd'.


One of its duties is to maintain an up to date copy of all the database configuration variable value pairs in an XML file format. This XML file format is accessible to all the client computers as they share NFS access to the same XML files used by the controlling application server. This allows each server and client computer to have access to configuration information regardless of whether the database is accessible and, therefore, server and client computers are much more resilient to failures. The format of the XML information is laid out in a directory hierarchy rooted in a directory that is, by default, “/usr/local/etc/labnet”. The information for each entity is stored in a file with a matching file name in a directory with a name that matches the entity type name. For example, the entity information for the entity 'discovery', which is a member of the entity type 'standalone', can be found in the file '/usr/local/etc/labnet/standalone/discovery'. The following is the XML rendering of the database information for 'discovery':


<?xml version="1.0"?>

<Entity_Profile>

<entity>discovery</entity>

<type>standalone</type>

<ConfigurationItem>

<variable>domain</variable>

<value indx="1">housing.mun.ca</value>

</ConfigurationItem>

<ConfigurationItem>

<variable>room</variable>

<value indx="1">UK0000</value>

</ConfigurationItem>

<ConfigurationItem>

<variable>ip</variable>

<value indx="1">134.153.83.151</value>

</ConfigurationItem>

<ConfigurationItem>

<variable>cn_name</variable>

<value indx="1">mail</value>

<value indx="2">www</value>

</ConfigurationItem>

<ConfigurationItem>

<variable>cn_domain</variable>

<value indx="1">ichc2009.ca</value>

<value indx="2">ichc2009.ca</value>

</ConfigurationItem>

<ConfigurationItem>

<variable>nms_contactgroup</variable>

<value indx="1">housing_admins</value>

</ConfigurationItem>

</Entity_Profile>


Contrast this with the information in the Ent_val_vals table entries for standalone entity 'discovery' on page 8. Notice how all the information is captured within the XML format except that the ent_id has been replace by the entity name from the entity table.


Once every 24 hours, or whenever 'lnsysconfigd' starts up or receives a SIGUSR1 signal, 'lnsysconfigd' does a complete sanity check to ensure that the database information is congruent with the XML files. It also performs sanity checks on other functions that it manages.


Unlike most daemons that wait for events to trigger actions, 'lnsysconfigd' sleeps for 1 minute and then polls the master database for changes. This means that the onus is on the servers to ensure that their data is up to date, thereby freeing the database from having to keep track of which computers have or have not been updated. Once 'lnsysconfigd' has initialized itself, it performs the following tasks in an infinite loop:


  1. Sleeps for 60 seconds

  2. Checks to see if the statistics reporting time interval has expired. The time interval is, by default, 5 minutes. If the time is up, then it evaluates the following information and reports the information back to the network monitoring database:

    • Percentage disk space and inode utilization for all mounted disks

    • CPU load average normalized for number of processors

    • Status of the software RAID devices

    • Status of daemons

    • Whether or not each disk partition is scheduled for an fsck

    • Percentage of network bandwidth utilized for transmission and receiving

  1. Checks to see if the sanity check time interval has expired. By default the time interval is 24 hours and, if it has expired, then a complete sanity check is performed with respect to the mysql and XML database values. It then continues on to step #7.

  2. Does a search on the most recent database change time stamp and compares the result with the local copy stored in the file /user/local/etc/last_query.conf. If no change has occurred then it goes back to sleep at step #1.

  3. A list of all the changed variables is down loaded from the database.

  4. For each of these changes, the updated values are applied to the XML data files.

  5. Any managed files that have changed are updated.

  6. All managed template files are processed yielding their corresponding derived files based on current database values. The resulting derived file is checked against the exiting file and installed if different.

  7. Run any scripts that need to be run when a file is updated.

  8. The server certificates are checked and updated if necessary.

  9. Run server function specific configuration. The following server functions trigger actions:

    • appserver – triggers the configuration of clients supported by this server

    • authdns – triggers updates to our authoritative myDNS database

    • masterldap – triggers creation of special machine accounts


  1. Update changes in the runlevel directories

  2. Go back to step 1

2.4 Configuration File Management

One of the most important operations of 'lnsysconfigd' is the management of the client and server configuration files. A special entity type, called 'fileMgr', has been created to direct this operation. Every managed file on our Labnet servers and clients has an entity description that is appropriately named based on the file name itself. The following is a webtool rendering of the 'access.conf' 'fileMgr' entity:


Figure xx:



The first four mandatory variables simply describe the normal file path, ownership and permissions information for this file. The optional 'script' variable is used to indicate to 'lnsysconfigd' that the specified script should be run whenever this particular file changes. The remaining FileTable table indicates what file instances should exist on a particular set of computers. Note that there is no way to enter new rows into this table. This is because of the hash values that must be computed for a given file instance. A special webtool, described later, was created to deal with the updating, deleting and creating of managed file instances. In this particular example, the computers that belong to the 'mscluster' group will all have a '/etc/security/access.conf' file that has a file hash of GpxEs3Rce+D53iStS1kipZxuC7E= if it is up to date. Using this hash, 'lnsysconfigd' can identify what files need to be updated by computing and comparing the file hashes. If the computed hash does not match the new or old hash values, 'lnsysconfigd' assumes that someone has manually made a local change to the file and will make a backup copy that is the filename with the suffix '.lnsyncsave'. Note that the other two file instances are templates and therefore the hashes only match the template file, not the actual file itself. In this instance, the template file hashes are compared and updated as with non template files but then the template file is processed through a macro processor that substitutes information from the database into the template based on the template macro commands. Then, the resulting file is installed into the specified destination. The template file that is associated with the mun@site_server is as follows:


<Template>

<setdefaults><type>server</type><entity><self/></entity>

# DO NOT EDIT THIS FILE

#

# This file was generated automatically by the program:

# /usr/local/sbin/lnsysconfigd

# and should not be modified directly.

#


# Login access control table.

#

################################################################

<if><varVal>ac_permission</varVal><then>


<iterateOverLists>

<iteratorList><varVal>ac_permission</varVal></iteratorList>

<iteratorList><varVal>ac_users</varVal></iteratorList>

<iteratorList><varVal>ac_location</varVal></iteratorList>


<if> <equal><item/><literal>allow</literal></equal> <then>


<literal>+: </literal><item>1</item><literal>: </literal><item>2</item>

<literal>

</literal>

</then>

<elseif> <equal><item/><literal>deny</literal></equal> <then>

<literal>-: </literal><item>1</item><literal>: </literal><item>2</item>

<literal>

</literal>

</then>

</elseif>

</if>

</iterateOverLists>

</then>

<else>

<literal>

+: root wheel: LOCAL .cs.mun.ca .math.mun.ca .pcglabs.mun.ca .</literal><varVal><type>server</type><entity><self/></entity>domain</varVal>

<literal>

-: ALL: ALL

</literal>

</else>

</if>

</setdefaults>

</Template>


What happens in this particular example is that the macro processor checks to see if this server has any Access table variables and, if not, then it dumps some default values. If it does, then the template processor iterates over the table values and fills in the permission field, the user field and the location fields based on the values displayed in the servers Access table. There is a useful Labnet utility that allows one to see what file will be produced from a specific template on a specific server or client. This command is invoked as follows:


templateCat templatefile hostname (client|server)


The templteCat output from the template above for one of our servers is:


# DO NOT EDIT THIS FILE

#

# This file was generated automatically by the program:

# /usr/local/sbin/lnsysconfigd

# and should not be modified directly.

#


# Login access control table.

#

################################################################

+: root wheel: LOCAL .cs.mun.ca .math.mun.ca .pcglabs.mun.ca

+: pbonnah: .ucs.mun.ca .cc.mun.ca

+: bkupclient: .mun.ca

-: ALL: ALL


Note how the macro substitutions embed themselves within the boiler plate text.


The following is the corresponding contents of the Access table for the specified server:





The final aspect that has been alluded to but has not been fully explained is how 'lnsysconfigd' determines what FileTable instance to use on a given server or client computer. In our example above, it was pretty clear that computers belonging to the 'mscluster' group would get that particular instance and that all other servers would get the 'mun@site_server' instance while all the client computers would get the mun@site_client version of the template file. In determining which instance will be used by a specific computer, the algorithm that is used starts at the most specific instance name and works towards the most generic name trying to find a match. Once the match is found that file instance is used on that computer. To generate the names, each computer entity has specific variables that identify it as a member of a file group, department or site. aHowever there are a few note worthy categories that have not been included. The following table illustrates all the possible instance nomenclatures and their meaning:


File Instance Format

Membership

<group name>

All computers that are members of this group get this file

<server_name>@host_server

Specific only to the server host specified

<server_name>@host

Specific only to the server host specified but is copied to the client image to be used by all the clients supported by this application server

<department name>@dept_server

Specific only to the servers that belong to a particular department

<department name>@dept

Specific only to the servers that belong to a particular department but is copied to the client image to be used by all the clients supported by this application server

<site name>@site_server

Specific only to the servers that belong to a particular site

<site name>@site

Specific only to the servers that belong to a particular site but is copied to the client image to be used by all the clients supported by this application server

default@labnet_server

All server computers get this file by default

default@labnet

All client and servers get this file




File Instance Format

Membership

<group name>

All computers that are members of this group get this file

<client_name>@host_client

Specific only to the client host specified

<appserver_name>@host_client

Specific to all clients served by this appserver

<client_name>@host

Specific only to the server host specified but is copied to the client image to be used by all the clients supported by this application server

<department name>@dept_client

Specific only to the clients that belong to a particular department

<department name>@dept

Specific only to the servers that belong to a particular department but is copied to the client image to be used by all the clients supported by this application server

<site name>@site_client

Specific only to the clients that belong to a particular site

<site name>@site

Specific only to the servers that belong to a particular site but is copied to the client image to be used by all the clients supported by this application server

default@labnet_client

All client computers get this file by default

default@labnet

All client and servers get this file


It should be noted that the file instance for a specific host is determined by starting at the top of this table and trying to sequentially find a match , stopping at the first match. For example if the “access.conf” file is being determined for host 'hogwarts' then the first test that is performed is to determine if 'hogwarts' is a member of any group associated with “access.conf”. If so then that group instance is used. Failing that test, the algorithm goes on to see if there is an instance of the name 'hogwarts@host_client' and so on. Since 'hogwarts' is in this case a client computer, the test for 'hogwarts@host_server' is ignored as are any names with the '_server' suffix.


The host name, department name and site name values, used in the matching process, are obtained from the host entity name, value of the dept variable for this entity and value of site variable for this entity respectively. Group associations are accomplished by associating a file_id, the name of the fileMgr entity, with a group name within the Config_Files_Table table for the particular cleint or server entity. For example, the following table entry exists on all 'mscluster' servers:




2.5 Webtool for Managing Files

As alluded to earlier, the fileMgr entities have a special web page interface that must be used to update old files or create new files. This interface is also responsible for updating the file hashes. It can be invoked from the main page by clicking on the Update Config Files button which loads the following page:




There are two alternatives open at this point, to modify an existing file or to create a new fileMgr entity. Selecting the Create New File button does not actually create a new file but it is a short cut to the entity creation page so that we can create the fileMgr entity for the new file that is to come under Labnet management. Note that only files that have entries in the fileMgr entity table can be managed by 'lnsysconfigd'. If an existing file entity is selected from the Select a File drop down menu, the list of existing file instances will be displayed if there are any and a Create New Instance is also displayed so that you can create a new file instance. By selecting the access.conf entity name from the drop down menu, the following page is loaded:




This indicates that there are three instances of the file already in existence, a template for clients, a template for servers and a regular file for all 'mscluster' computers. To edit an existing configuration file instance, simply click on the select button to the left of the corresponding file instance name. Before clicking on the Confirm button at the bottom, it is necessary to select the host on which you wish to edit the configuration file. Clicking on Confirm will load load the following page:




We are now ready to edit the file on the computer 'odie'. As can be seen from the messages in blue, the current content of the file has been deposited in the /var/upload directory on odie and is available for editing. Once we have updated the file, we again return to the webtool and notice that there is an SVN log message field that must be filled out before we can upload the changes. This is because all configuration changes are uploaded into an SVN revision control repository so that changes can be logged for future reference. Once the log message has been added, we can submit the change using either Update Config button. This will cause the new file to be placed in the master config file directory and the file hashes will be updated in the configuration database. The 'lnsysconfigd' daemons running on all the servers, as part of the file management operation, will automatically sync down the latest master config directory using the rsync program.


If we want to create a new file instance then we would click on the Create New Instance button. This would result in following additional information being displayed on the first screen:




This information is necessary in order to define the set of computers that will receive this new instance. Additional input fields will be displayed after selecting the Administrative Domain. In the case of the Group Select, a multivalued input box is provided so that you can define the members of the group. Finally, one selects the server to edit the file on and clicks on the Confirm button. Other than the fact that an empty file is presented to you on the upload server, the process is the same as with an existing file from here on in. Now that we have briefly looked at how configuration files are managed under Labnet, it is now necessary to see how Labnet tools are used to manage the actual software on the desktop client computers and the servers.

2.6 Writing Labnet File Templates

Labnet templates allow configuration files to be constructed from information contained within the Labnet configuration database. With one template, unique configuration files can be constructed and installed on each of our desktop computers and servers within seconds. If a configuration file needs to be changed, the change can be incorporated into the template and all the configuration files will be changed accordingly whereas without templates all instances of the file would need to be edited individually. Clearly templates are the way to go if at all possible.

Labnet templates use XML constructs to build file instances. For those that are not familiar with the XML terminology, an XML file consists of a series of nested XML elements where an XML element consists of a markup tag such as <tagname> followed by some content and terminated by an end markup tag such as </tagname>. If there is no content then the markup tag can be represented as the markup tag with a '/' character at the end of the name such as <tagname/>. The the content enclosed within the start and end markup tags can be straight text (eg. <entity>hogwarts</entity>) or can be one or more nested XML element (eg. <entity><self/></entity>). In fact it can be a combination of both as shown in the following example:

<varVal><entity><self/></entity><type>client</type>servers</varVal>

In this example the varVal element has nested within it 2 XML elements, <entity> and <type> followed by the text string “servers”. Further, the <entity> element has the XML element, <self/> nested within it. It should be noted that this nesting imposes a tree structure on the elements of an XML file with the outer most element being the root node.

The idea behind a template is that you want to take a generic template file, that has syntactic components delimited by XML tags, and expand the file by replacing the XML elements with appropriate text as indicated by the XML elements. The resulting text is then dumped into the output file that has been specifically tailored for a given host computer through this expansion process. Within the context of the Labnet template system the root node of the XML tree starts with the <Template> markup tag and ends with the </Template> markup tag. This means that all template files must begin with <Template> and end with </Template>. The straight text content within the Template element is treated as literal data and inserted into the output buffer unchanged. Certain XML markup language elements within the root node insert character strings into the output buffer according to the element's function. Some XML elements do not generate text for the output buffer but instead are used as syntactic elements that provide arguments to enclosing XML elements. It should be noted that many XML elements, even those that would normally cause text to be inserted into the output buffer, can be used as arguments to another XML element as long as they conform to the argument structure of that element as described below. Another point worth noting is that most XML elements return an array of character strings. Even a single character string result is returned as an array of character strings with one element. The main exceptions to this are the conditional XML elements that return true, false and invalid. In some contexts only one value is used and it is assumed to be the zeroth element of the string array. Nested XML expressions are evaluated recursively and their values percolate up to the upper levels of the nested hierarchy and are interpreted according to the markup element that encloses them. The resulting string object returned by the root node <Template> is the concatenation of all these nested node elements coming together in a single string object.

A brief description of the XML elements that insert text into the output buffer and their XML arguments are as follows:

  1. <literal> content </literal>

    This element simply returns the literal content as a single element string array. It is primarily used to insert white space between XML elements. This is due to the fact that we have configured our XML parser to ignore white space (including new lines) between XML elements so as to make it possible for making the templates more readable. Note that any text, not encased by '<' and '>' characters is considered to be literal content as well unless it consists only of white space in which case it is ignored.

  2. <self/>

    This element returns the entity name of the host for which this template is being evaluated. It takes no arguments.

  3. <selftype/>

    This element returns the entity type of the host for which this template is being evaluated. It takes no arguments.

  4. <setvalue> variable and value </setvalue>

    <getvalue> variable </getvalue>

    These two XML operators set and get the value of a variable respectively. The first child element, “variable”, must evaluate to a single string that is used as the variable name for both of these operations. For the <setvalue> operator the second child element, “value”, will be used to set the variable's value. Note that only the first string value from the string array “value” is used even though the string array may contain more than one string. The <getvalue> element returns the last value assigned to the specified variable or NULL if no value has been assigned by the <setvalue> operation.

  5. <setdefaults><type> content </type><entity> content </entity> contents affected by setdefaults </setdefaults>

    This element does not produce any value and nor does it result in any output being directed to the expanded output file. The sole purpose of this element is to set some internal context that can result in short cuts in defining other elements. The two values that can be set are the default entity name and the default entity type. Within the context of this element any other elements that depend upon entity name and type can use these as default values. For example the <varVal> element requires entity name and type information that can be implicitly provided by the values defined by this element. If other values are required then the entity and type information can be explicitly provided. Where indicated these syntactic elements are optional as long as the default values are appropriate.

  6. <varVal> [<type> content </type>] [<entity> content </entity>] variables </varVal>

    This element provides a means for extracting values from the database, making them available for other XML elements or for providing output to the expanded output file. The two syntactic elements <type> and <entity> are used to delimit the type and entity name respectively of sys_config database entities. The 'variable' is assumed to evaluate to the name of one or more variables whose value is being queried. The resulting array of strings returned by this element corresponds to the values of the variable for the specified entity of the specified type. If more than one variable is included then the resulting values will be the concatenation of the variable values. Similarly if more than one entity name is included then the resulting array of strings will be the concatenation of the variable values for the list of entities. Note that the <entity> and <type> elements are optional. In the event that one, the other or both are missing. the missing values are taken from an internal context that can be set by the <setdefaults> XML element.

  7. <confVal>[<entity>content</entity] variable </confVal>

    There is a special entity type, called lnappconf, in the sys_config database that is designed to hold application specific configuration information and this element provides a means to access this information. The entities of this type are collections of variable value pairs that are associated with some application or function. The <entity> element is the name of the specific collection of information. If it is not specified then it gets its value form that set in the <setdefaults> element. The remaining content of the element should evaluate to a single variable name. The element returns a list of one or more values that this variable has associated with it..

  8. <def_entity>

    This element returns the value of the default entity name as set by the <setdefaults> element

  9. <def_type/>

    This element returns the value of the default type as set by the <setdefaults> element

  10. <is_entity>entity name [entity type] </is_entity>

    This element returns a boolean value and is therefore used within a conditional context. It takes one or two expressions as children. The first expression is an entity name and the optional following expression is an entity type. If the entity type is missing then it defaults to the default entity type defined by the current context. If the entity name is actually an entity within the entity type then the <is_entity> expression returns true.

  11. <OR_expr> [<type> content </type>] <var> content </var><val> content </val>... </OR_expr>

    <AND_expr>[<type> content </type>] <var> content </var><val> content </val>... </AND_expr>

    These elements are used to search the sys_config database for entity names that match the enclosing expressions. In the case of the <OR_expr> element, any one of the enclosed expressions that match will result in the inclusion of the entity name whereas with the <AND_expr> element all the enclosed expressions must match in order for the entity name to be included. The expressions are composed of a series of variable name and value pairs delimited by the syntactic XML elements <var> and <val> respectively. There must be at least one <var> <val> pair for a valid expression. If there is only one <var> <val> pair then it doesn't matter whether you use <OR_expr> or <AND_expr> to enclose the pair. If there is an error or no elements are found this element returns a value of NULL. As an example lets say that we would like the entity names of all the application servers within the computer science domain. The following expression would yield these names as an array of entity names:

      <AND_expr><type>server</type>

      <var>functions</var><val>appserver</val>

      <var>domain</var><val>cs.mun.ca</val>

      </AND_expr>

  12. <iterateOverLists>

      <iteratorList> array of strings </iteratorList> …

      mixture of XML and literal content

    </iterateOverList>

    This XML element produces a string array value that can be passed on to another XML element as an array of strings, one string element per iteration. If this element appears in the final output context, the resulting string array elements are concatenated together to form the final output string. More specifically this XML element is used to generate repetitive text output. The 'mixture of XML and literal content' forms the body of the text that is repeatedly dumped to the expanded output buffer. The <iteratorList> elements control the number of iterations of the body that get dumped to the expanded file and can influence the content of the output text through the use of the <item> element described later. There must be at least one <iteratorList> element present and the number of strings in the evaluated array of strings will determine the number of repetitions of the body. If more than one <iteratorList> element is present, then they all must have the same number of strings in their corresponding array of string. Subsequent <iteratorList> elements provide additional information that can be incorporated into the resulting body as opposed to affecting the number of iterations. In most cases multiple <iteratorList> elements are used to dump the contents of tables in the database. The <item> element can only be used within the context of an <iterateOverLists> element and has the value of the current string value from the 'array of strings' defined by the corresponding <iteratoList> element. In other words the ith iteration of the body being evaluated and dumped to the expanded output file will therefore have an <item> value that corresponds to the ith string value from the array of strings. Note that the <iterateorList> element is a syntactic element that can only exist within the <iterateOverLists> element.

  13. <item> numeric content </item> or <item/> or

    <item level=”number”>numeric content</item> or

    <item level=”number”/>

    This element although only available within the <iterateOverLists> element, can be used in a variety of ways and so requires some additional explanation. The first consideration is the number of possible variants on this element. The simplest and most common usage of this element is the <item/> form that has no content and has a value of the ith string from the string array from the 0th <iteratorList> element. If there are more than one <iteratorList> elements then some way must be found to represent the ith string from these additional <iteratorList> elements. To do this an index must be provided to indicate which <iteratorList> element is to be used to extract the ith string and to do this the second form <item>index</item> is used. Finally a further complication occurs when an <iterateOverLists> element is used within the body of another <iterateOverLists> element. To indicate that you would like the value of the <item> element from an enclosing <iterateOverLists> element the level='integer' attribute can be applied to the <item> element to indicate the number of level from which to take the string value from. The innermost level is the 0th level and is assumed by default. The 1st level is the immediate level enclosing the innermost level and so on. Again if there is only a single <iteratorList> element in an outer level the short hand form of the <item> element, <item level='num'/> can be used.

  14. <select><index> numeric content </index> array of strings </select>

    Given a content that evaluates to an array of strings and an element that evaluates to an integer 'n', this element evaluates to the n'th string in the array of strings. Note that the <index> element is used as a syntactic delimiter within the context of the <select> element and is not used as an independent element.

  15. <indexof> pattern and string array </indexof>

    The first child of the <indexof> XML operator, “ pattern”, must evaluate to a string and the next child, “string array” must evaluate to an array of strings. The pattern string is matched with each of the string array values and the index of the first matching string is returned. Note that this match is just a simple string compare that must completely match. Also note that the index returned is a numeric ASCII string and can be used as an index in the select command or as text output.

  16. <numStrings> content </numStrings>

    Content is evaluated to an array of string values and the “numStrings” element will return a numeric string which has the value of the number of string array elements contained in content.

  17. <plus> first_numeric_value second_numeric_value </plus>

    <minus> first_numeric_value second_numeric_value </minus>

    <mul> first_numeric_value second_numeric_value </mul>

    <div> first_numeric_value second_numeric_value </div>

    These basic prefix integer operator XML elements allow for the evaluation of the respective arithmetic operations when provided with 2 child nodes that each evaluate to numeric integer string values. The resulting integer value is converted to a string and returned as a single numeric string value which can be used literally or in conjunction with other expressions.

  18. <if> condition <then> then-content </then> [<elseif> condition <then> then-content </then> </elseif>]... [<else> else-content </else>] </if>

    This element returns the value of one of the 'then-content' or 'else-content' XML sequences depending on the values of the 'condition' components. If the 'condition' of the <if> or <elseif> elements evaluates as true then the XML and text content of 'then-content' are evaluated and returned in an array of strings. Otherwise if an <else> element exists then the XML and text value of its 'else-content' are evaluated and the results returned as an array of strings. The following commands are recognized specifically in the context of a conditional:

    1. <equal> content1 content2 </equal>

      <noteq> content1 content2 </noteq>

      These conditional elements take two children that evaluate to string values and compares them for equality or inequality respectively. It returns true if both strings are equal or not equal respectively and false otherwise.

    2. <hasvalue> string_element string_value </hasvalue>

      This conditional element returns true if the string_value is found in the list of strings given by string_element.

    3. <match><pattern> regular expression </pattern> searched-string </match>

      This conditional element takes a regular expression, syntactically delimited by the element <pattern> and matches it with the 'searched-string' value of the rest of the element. It returns a true value if the pattern matches the string or false it it does not match.

    4. <or> expression1 expression2 … </or>

      <and> expression1 expression2 … </and>

      These elements evaluate the sequence of contained expressions and perform a logical “or” or “and” on their values respectively, returning either true or false.

    5. Many of the other XML elements that return string values can also be used in the context of a conditional expression. If they succeed and return a non NULL value then the value of the expression is considered to be true. If they return with a NULL value then they are considered to be false.

  19. <substitute> <pattern> content </pattern> string_content substitute_content </substitute>

    This element compiles the regular expression value contained within the <pattern> element and then matches it with the string value of string_content. If the pattern matches anywhere within the string, the value of substitute_content is used to replace the left most matched component of the string. If there is no match then string_content is returned unchanged. It should be noted that either one or both of string_content or substitute_content musted be an XML element so as to delimit one from the other. The regular expression can contain groupings delimited by parentheses. The characters matched by these groups can be inserted into the substitution string if the \0 through \9 tokens appear somewhere within the substitute_content string. The \1 through \9 tokens within the substitute string are replaced by the characters matched by parenthesized groups 1 through 9 respectively and \0 is replaced by the entire matched portion of the string. The resulting substitution string is then embedded within the larger string_content and returned as a string value to its enclosing element.

  20. <getentitylist> content </getentitylist>

    This element simply returns a list of the entities of the specified entity type. If no entity type is specified then the default entity type set by <setdefault> is used. It that is NULL or if there are no entities then this element returns NULL.

  21. <unique> array of strings </unique>

    This element takes an array of string values and removes duplicates.

  22. <sorted> array of strings </sorted>

    This element takes an array of string values and sorts them alphabetically.

  23. <line> xml elements ...</line> and <cat> xml elements ...</cat> both concatenate the values of each of their child elements and return this string. The <line> element differs from the <cat> element in that it appends a newline character to the string. The <line> element can optionally contain <ts> integer </ts> elements that act as tab stops for formating columns within each line. The integer specifies the character count within the line where the succeeding text will begin. The space character is used to pad columns to the desired character count position. If the tab stop position has already been passed no padding is performed and subsequent stings are simply appended.

2.7 Certificate Management

One of the many tasks that lnsysconfigd and lnclientconfigd do to manage the respective certificates. For Ipsec and ldap queries certificates are required to initiate and encrypt network connections. Also the ssh host based authentication that allows us to establish a web of trust among our Labnet clients and servers is also made possible through the management of public and private key pairs for our clients and servers and again this job is performed via lnsysconfigd.

IT should be noted that the certificates are checked on a daily basis during the lnsysconfigd sanity check or when initiated through the webtools by setting the appropriate cert_cmd variable in the server's entity description. On client computers lnclientconfigd performs checks whenever relevant variables change in the configuration database.

2.8 Template Aware TFTP Daemon

All diskless client computers boot using the PXE boot sequence built into the BIOS and network controller. When the client sends out s DHCP request the server designated to service this client responds with information concerning the name of the boot program that is to be used and the configuration boot menu that is to be used. The boot program is loaded via the lntftpd daemon running on the server. This daemon is a specially modified TFTP protocol daemon that was developed specifically for Labnet to customize the configuration boot menu according to a template file. This allows the administrator to boot clients with various variations with out having an unmanageable number of individual boot menus. The following is a typical boot menu template used to boot a diskless client:


<Template>

<setdefaults><type>client</type><entity><self/></entity>

DEFAULT <varVal>kernel</varVal>

APPEND root=/dev/nfs init=/etc/rc.d/<varVal>preinit_script</varVal> nfsroot=/diskless/rootfs,v3 ip=dhcp <if><varVal>bootOptions</varVal><then><varVal>bootOptions</varVal></then></if> softlevel=<varVal>run_level_name</varVal>

</setdefaults>

</Template>


Note that this template allows the kernel, the runlevel and boot options to be configured dynamically into the boot menu when a given client computer is booted. For example the actual menu file that is sent to the client computer “hogwarts” based on its sys_config entity values is as follows:


DEFAULT bzImage.P4-2.6.38-gentoo-r2

APPEND root=/dev/nfs init=/etc/rc.d/rc.preinit.moreminimal nfsroot=/diskless/rootfs,v3 ip=dhcp softlevel=ipsec


where “bootOptions” variable is not defined and the other variables are defined as follows:















Mandatory Client Variables



2.9 Configuring a Diskless Linux Client

As can be seen from the previous example in Section 2.8, the sys_config database can be used to customize virtually any part of the boot process. However, how is it possible to customize configuration files on the diskless client computers when the file system is mounted read only and shared between many client computers? This process is made possible through the combined efforts of the lnsysconfigd daemon running on the appserver and the lnclientconfigd daemon running on the client. If the appserver is processing a managed file (see Section 2.x) destined for the client file system, lnsysconfigd first checks to see if any of this server's clients has a different version of this file. If they are all the same then the file (or processed template) is simply placed in the read only file system in-situ. If there are two or more versions of the managed file then lnsysconfigd replaces the configuration path name with a symlink pointing to “/local_conf/<config file name>”. Meanwhile, one of the first things that the client computer does when it boots up, is to mount a tempfs file system onto the “/local_conf” directory thereby allowing local configuration files to be written there. The lnclientconfigd daemon running on each client computer can then create the version of the configuration file appropriate for this client in this directory. Programs accessing the configuration file are transparently redirected by the syslink to the customized file in the client's “/local_conf” directory and the program behaves as if it were reading the actual file.

Assuming that there is a functioning appserver running on a local area network where a client computer is located, then it should be possible to boot the client computer disklessly. Of course the availability of a suitable image is a consideration. Currently our client images support 32 bit Linux compiled for fairly generic Intel chip sets so most Intel based computers can be booted from our image. In addition the network speed is a consideration. Generally speaking our diskless clients boot across a 1 gigabit network connection but some clients boot from 100 megabit and are quite usable. The next consideration is the client's sys_config configuration. Each client computer has an entity of type “client” defined in the sys_config database describing its configuration. The entity name concatenated with the value of the domain variable is the fully qualified name of the client as it appears in the DNS and host file. The following table outlines the currently used client mandatory variables and their descriptions:

Variable

Description

domain

Domain for this client

ip

IP address assigned to this client

mac_address

Ethernet hardware address used by DHCP to recognize this client

kernel

Linux kernel loaded by boot program

bootfile

Boot executable loaded to initiate the boot process (pxelinux.0 for linux only clients)

menu_template

Boot menu used by boot executable to load kernel and pass kernel parameters

preinit_script

Name of script in /etc/rc that is run before the init daemon starts

run_level_name

Name of the run level that the client boots up

servers

Name of the application server responsible for booting this client

ntpserver

List of the ntp server names that will be serving broadcast ntp data for this client



Note that all of these variables must be given values. Most of these variables are quite system specific but the values should not be too hard to figure out. The kernel and bootfile variable values must be the names of kernel/bootfile images respectively that sit in the directory “/diskless/boot” on the appserver specified by the “server” variable. The “menu_template” variable value must be the name of a file located in the “/diskless/boot/template” directory of the boot appserver and the “preinit_script” variable value must be the name of a shell script located in the “/diskless/rootfs/rc.d” directory of the appserver. The “run_level_name” variable value must be the name of a directory that contains the scripts that are run at system startup. As an example see the figure Mandatory Client Variables above.

To boot Linux only a few of the “Optional Variables” are necessary. The following variables are used by Linux:

Variable

Description

room

Location of computer

ssh_key

Public ssh key for this client (initialized by the lnclientconfigd)

printers

Entity names for the printers that are available to this client

defaultprinter

Entity name for the default printer

graphical_theme

Determines the splash screen and greeter configuration

inittab_type


num_of_virtual_terminals

Determines the number of virtual consoles configured in /etc/inittab



These are the basic variables that should be set to bring up Linux. The room variable value indicates the location of the computer and the ssh_key variable is filled in automatically. If you are not using printers, these values can be left blank.

Additionally the “client” entity type currently supports three tables: “Access”, “fstab” “Config_Files_Table”, “crontab_tasks”. If your “/etc/security/access.conf” is using default values and you have no cron jobs or special disks to put in the fstab then these tables can be left empty. If the client computer does not have any managed files that are a member of a named group then the “Config_Files_Table” table can also be left empty.

Assuming that you have a reasonable client entity defined and a properly configured appserver (see Section 3.x) you should now be able to boot the diskless client computer specified in your client entity. The next step is to enable PXE boot option on the client computer. Once enabled the cloient should start sending out DHCP requests using its hardware address. Note that this is one way to get the value for mac_address mandatory variable value mentioned above. Within a few seconds the DHCP daemon on the appserver should respond by loading pxelinux.0 and then loading the boot menu options. If Linux is the default OS then Linux should immediately load and mount its root file system from the application server. The Linux splash screen and greeter should then load. Now users can login using their username and password. At the end of the session the user logs out and the splash screen will appear ready for the next user.

3 Image Management Software

3.1 Replication of Master Images

One of the most daunting tasks for managing 1000+ computers is software management. To ensure homogeneous behavior among our servers, all Labnet servers are imaged from one of a very few primary images. Having a comprehensive complement of software installed does not impair the functionality of the server as only the services that are configured to run are actually active. To upgrade a package or add new software, the install or upgrade needs only to be performed in a chrooted environment on the server containing the master image. In the evening, this updated software is distributed to all servers that use this image. To minimize the time it takes to distribute images to our servers, a hierarchical approach is taken. The following diagram illustrates how a large number of servers can be imaged in a short period of time:





It should be noted that none of the images are active, so even the software running on the software distribution servers must be loaded from an image. The information from the master configuration database is used to orchestrate all rsync activities.


All the Labnet desktop client computers run Linux disklessly. Their software depends on the server that is providing the nfs mount point for the root file system. Servers that function in this capacity are called application servers and they have an additional software partition that is used by the diskless client computers that are configured to boot from this server. This diskless partition is also imaged from one of several diskless images using the same scheme illustrated above. Also many of our client computers dual boot Microsoft Windows and their images are also distributed using his scheme.


Diskless desktop computers can be booted from different application servers under the control of the master configuration database. Simply setting the 'server' variable under the client's entity and rebooting the client, is all that is needed to bring up the client on a different application server. A batch edit command issued from the configuration webtool can be used to upgrade a whole lab in under 5 minutes or move a lab to another application server for maintenance.


To complete the discussion on imaging, it is necessary to look at the images that are kept in our repository and how the dissemination of images is managed. The following diagram illustrates the categories of images that are currently maintained to support our managed software roll outs:





Notice that three independent Linux software roll out environments, development, test and production, are supported. These three environments are supported for both the servers and the diskless desktop client images. Each environment contains an image for the root directory and a corresponding '/var' directory. Additionally 64 bit and a 32 bit image versions are supported for our servers. Our development images are used by our developers to test new versions of software or create new distributions. The test images are used by the Department of Computer Science as a staging area to determine if there are any issues that need to be addressed before being moved into the production environment that supports all the rest of campus. Microsoft Windows XP image directories come in two flavours, one for the C: disk snapshots and one for the M: shared software network drive. The C: snapshots are stored in several separate image directories targeted for download onto specific application servers for the support of their specific client hardware. The M: network drive is common to all application servers that support Windows XP. Microsoft Windows 7 does not need an M: drive but still has an images dirctory for the Windows 7 images.


In total, we currently have 16 linux image directories and 10 Microsoft image directories. One might ask how is it possible to ensure that all these images end up on the appropriate servers in the correct directories in a timely fashion? To accomplish this, the Labnet utility, 'lnrsync', is launched nightly via cron on each of our distribution servers and, under the guidance of the information contained within the Rsync table in each server's entity table, it runs the rsync command to refresh the destination host's software directories. A special Rsync Info Editor webtool was created to provide a visual representation of the rsync process. The following screen shot shows the page that allows the user to select an appropriate view of the rsync tree:





There are three view modes available. The first mode is used to view the 'rsync' tree for the specified image. It also offers the option to prune and splice branches onto this tree from other image trees. It is through the use of this tool that the administrator can regulate the upgrade process on a per server basis. The following screen shot illustrates how one would go about moving 'scout' and 'keeper' from the diskless_dbgen32_test_1 image to the diskless_dbgen32_prod_1 via the sync server 'ratchet':





Servers are selected by simply clicking on their names in the left hand tree. Their names should appear in the box down on the bottom left hand corner. Then, the server in the right hand tree, from which these servers are to receive their image, is clicked and its name should appear in the bottom right of the screen. By clicking on the Submit Change button, the changes are made in the configuration database and the new trees are displayed.


The second mode is used to view the 'rsync' tree for the specified image along with the rsync time stamps. This is useful for determining when servers were last synced. Server names highlighted in red have syncing turned off. Dates that are highlighted in red indicate that the rsync has not been completed within the current day. Syncing can be turned on and off by clicking on the host name. A screen shot of this mode is shown as follows:




Note that servers 'lnmtr' and 'stbons' have syncing turned off.


The third and final view mode is from the perspective of a selected server. In this mode, the source for all of its software directories are displayed. The following screen shot displays the sync tree information for the server 'odie':




The server and all its synced directories are displayed in the left most column and the source for that image is displayed in the right most column with all the intermediary servers shown in between. The image names are in bold and if left clicked will bring you to the rsync tree display, as in the first mode. Holding control and left clicking on the image name will bring you to the entity that keeps track of all the image entities. All images must have an entity description. The following screen shot displays the image definition for dvars_dbgen32_test_1:




The host variable indicates the server that contains the authoritative image directory and the directory variable is the path to the image directory on that server. The Rsync_Filter_Tab table is used to create the file of include/exclude rules used by 'rsync' during the transfer. In this example, only the contents of the template directory will be copied and all other files and directories will be excluded. By default, all managed files are automatically added to the exclude list as they are dealt with separately by 'lnsysconfigd'.


3.2 Windows XP Image Booting

Many of our desktop client computers also run Microsoft Windows XP software. Because the Microsoft operating systems prefer to operate from a hard drive located on the local computer, these computers have local hard drives that are imaged from special snapshots stored on the application servers. Whenever Linux starts up on the client computers, one of the startup tasks is to re-image the hard drive partition containing the Microsoft software. The special snapshots contain only the active (used) blocks of the disk and also the precomputed hash unique to the block so that the client computer only needs to compute the hash for each active block and compare it with the hash stored on the nfs mounted server image directory. If the hash does not match, only then does the block need to be uploaded from the server. The special snapshots are created by makeImage, a Labnet utility based on libraries from the PartImage package and rsync in conjunction with the WIMP webtool, the Windows Image Management Program. WIMP takes the administrator through the Microsoft software install procedure step by step and initiates the snapshot at the end. Keeping the image small means less time to re-image the disk. By endeavoring to install Microsoft software to a network drive, most of our images are less than 8 gigabytes in size and can be re-imaged in under 1 minute. Computers with small images are imaged as soon as a user logs out. Computers with larger images are imaged on demand through the WIMP webtool. The following diagram illustrates a typical client application server configuration:




Note that, since the network is the bus that connects the diskless computers to their virtual disk, a minimum of 100 megabit Ethernet is required to have reasonable response time, although 1 gigabit Ethernet is preferable, especially for the application server. The Windows XP network share along with the Windows XP hard drive images are stored on the application servers and are imaged nightly along with all the main server partitions.

3.3 Windows 7 iSCSI Booting

3.3.1 iSCSI Motivation

Over the years we have had to adopt various methods in order to manage Microsoft Windows images. Our first attempt at supporting Windows 7 imaging on our client computers was less than satisfactory. Although we were able to get our imaging software to work and we were able to boot Windows 7 successfully, we ran into problems while installing software that was based on the .Net framework. It appears that the software installers do not allow installing to any drive but C: drive. In the past, to minimizing the size of C: drive, we relied on being able to install most of our software onto the M: drive. With a small C: drive we are able to image the client computer after every login session. Clearly with image sizes reaching 80 gigabytes or more, imaging between login sessions is impossible. To get around this dilemma, we decided to investigate booting our client computers using iSCSI disks supplied and managed by the server. In principle this was a perfect solution to our problem but it required reinventing our imaging strategy completely.

3.3.2 iSCSI Implementation

After a great deal of research and a lot of trial and error we were able to develop an imaging strategy based on the iSCSI and logical volume management technologies built into the Linux kernel. The Linux kernel's Logical Volume Manager driver was instrumental in being able to create a whole series of readable and writable disk partitions for each client computer based on a single read only master image. This was made possible by one of the LVM modes of operation that can be used for doing disk backups of active partitions making them appear to be quiescent even though the disk is being written to. This “snap shot” mode of operation, as it turns out, can also be used to create a series of read/writable partitions that can shadow a single read only disk partition by redirecting the write operations to a writable partition often called a Copy On Write or COW partition. What this means is that a read of the virtual disk will return the content of the read only disk being shadowed whereas the disk writes are intercepted and written to the COW disk instead. If an attempt is made to read from a block that has already been written to, then the new value of the block is returned from the COW partition instead. The beauty of this technology is that a COW partition can be created in a matter of 1 to 4 seconds, regardless of the size of the partition that it is shadowing. The COW partition only needs to be big enough to contain all the data from the write operations that occur during the login session. At the end of the session the COW partition can be deleted and a new empty COW partition created thus ensuring that the next user gets a pristine image. As a bonus there is no need to create an M: drive for the software because now all the software can be installed onto the virtual C: drive. This makes creating images considerably easier.

Initial trials of our iSCSI implementation worked perfectly for one or two computers but performance went way down when many computers were booted from a single application server. It turned out that the reads from the master image disk were not able to keep up with the demand. When serving a large number of diskless Linux clients, the kernel disk block caching algorithm ensures that repetitive file I/O is read from the cache and therefore the performance is acceptable. However the iSCSI disk driver bypasses the normal Linux I/O caching so an alternative caching solution was needed. The solution was to stack a disk block cache on top of the physical volume allocated for the master image logical volumes. By stacking the cache on the physical volume, all images can avail of the cache without independently creating individual caches for each logical volume. Fortunately the folks at Facebook had run across a need for higher disk performance and had implemented a stackable raw disk block caching driver for Linux (Flashcache). Facebook used a solid state disk for its cache but in our implementation a 2 gigabyte RAM disk was used to provide maximum performance. The performance increase was an order of magnitude greater peaking at 116 megabytes per second.

The next performance bottleneck became noticeable when an even larger number of computers were started simultaneously. This bottle neck was a direct result of an inability to keep up with disk writes on the COW partitions. To remedy this, a solid state disk was used for the COW partitions. In a latter implementation the solid state disk was replaced by another flash cache virtual disk, optimized for writing, stacked on top of another disk partition reserved for COW storage. Again the caching device used was a ram disk dynamically allocated at system boot time.

During our trials, based on a worst case scenario, 45 computers were booted simultaneously and the boot time for all computers was 14 minutes for a lab using 100 megabit Ethernet whereas the boot time was 7 minutes for 45 computers booting from a lab using 1 gigabit Ethernet.

The following illustration outlines the final disk layout that was used:




When we started using Windows 7, it was noted that it took an abnormally long time to log into Windows 7. This turned out to be a result of Windows 7 having to create a new user profile each time a user logs in. Normally when a user logs into a personal computer the profile has already been created so it takes far less time to log in. To remedy this delay, a default profile was created based on a pseudo user as a part of the image generation process. With this profile in place the login time was reduced to the 10 seconds it takes for the user's home directory (H: drive), the public (S: drive) and whatever printers the users can access to be mounted. The following sections will outline the configuration, image generation and deployment strategies for Windows 7 in an iSCSI environment.

3.3.3 ISCSI Appserver Configuration

Note that the screen shots shown in the following description are based on a generic Intel based server with a pair of 2 terabyte hard drives and 32 gigabytes of main memory. Other configurations are easily accommodated by filling in alternate sys_config database entries. The following is an outline of the steps that are required to build an application server:

  • Using the System Configuration Webtool, create a new server entity based on a previously configured Windows 7 appserver if available.

  • Fill out the mandatory variables as for any other server except that the “functions” variable must have the samba_win7 and w7appserver values set in addition to the basic_services. Also the kernel must be 64 bit and support lvm and the flash cache module.

  • Set up the iSCSI control variables:




The values above can safely be used although it is assumed that you have partitioned a RAID device, /dev/md2 in this case, to be large enough to accommodate all your Windows 7 images and a RAID device, /dev/md3, to be large enough to accommodate all the COW partitions. The memory in the CPU must also be able to accommodate the RAM disk that is used to support the image and COW caches. The sizes of these RAM disks are in units of 1K blocks allocated from the servers memory. In the above example a total of 12 gigabytes are used. This is a reasonable allotment for a CPU with 24 to 32 gigabytes of memory. If you wish to use an SSD disk for the COW cache then do not set iscsi_cow_pv or iscsi_cow_cache_size but instead set iscsi_cow_cache_dev to the name of the ssd disk, /dev/sdc for example.

  • Set the values in the “Rysnc” table so that the application server will be built with the appropriate software configuration, (See section 3.1).




Note that “/” and “/var” are partitions that are used by the application server whereas “/diskless” and “/diskless/vars” are the Linux images used by the diskless clients. The “/images/cs” directory is the directory that contains the compacted images that you have prepared using the WIMP tool described in Section 3.4 and uploaded to the master server, “pooky” in this example. The sub directory “cs” of “/images” is used to segregate images according to departments or administrative groups responsible for maintaining the images. In this case only the “cs” (Computer Science) images are copied over to our server. As a rule of thumb, you would only copy over images that could be used by the potential clients of this server. By “potential clients”, we include any clients that could be transferred over as a result of their normal server being out of service.

  • The “WindowsImageTab” table indicates which Windows 7 images this server serves to its diskless clients.




In this particular example there are two images that this server can serve, cs_lab and csi5. Both of these images must have been created by the WIMP process described in Section 3.4 which will have created entities of type “WindowsImages” with the same name. If this is a new appserver, the TimeStamp should be set to “none” for each image. The “TimeStamp” variable is set automatically by lnsysconfigd whenever it initiates the imagesync program to synchronize the images in the “/images/<dept>” directory with the master logical volume created for that image and used as the back end of the iSCSI target. The following is a screen shot of the csi5 “WindowsImage” entity:






Note that the directory name in “directory” variable is the directory in which this image is located. Also note that the “imageLvSize” and “cowLvSize" variables represent the sizes of the iSCSI image partition as it appears on the client and the amount of writable space that a client can write to this partition respectively. The creation and management of the data values in this entity is handled by WIMPng.

  • The following table “fstab” is used to construct the “/etc/fstab” system configuration file:




In this example the root partition is on a bootable RAID device that must be set up to do auto RAID detect so that the operating system can mount the disk at startup as we do not use initrd partitions to start our systems. Note that the swap devices are specified with the physical device prefixed by “UUID=”. In general all physical devices are represented in this way because they will be represented in the “/etc/fstab” file using the “UUID=<uuid number of filesystem>” notation as physical device names can change. Logical device names are always represented literally in the fstab file. The rest of the file systems are located on logical volumes defined in another table described later. Note that the “fst_UUID” column must be set to “none” as this field is set by the buildserver program when the system is built. The only other field not found in “/etc/fstab” is the “fst_size” field that indicates the size of the partition in bytes. This value is used in by buildserver to determine the size of the partition that is to be built.

  • If logical volumes are used as in the example above then the “logicalVolumes” table will need to be set up as follows:




Note that the lv_uuid column must be set to “none”. It will be filled in by buildserver. The logical volume names must match the logical volume names used in the “ fstab” table and all of the logical volumes should be in the same volume group. (Note: that the “lvs -o lv_name,lv_uuid” command will print device names and their uuids)

  • If software RAID devices are used then they must be defined in the “mdadm” table as follows:




You will notice that the fields are defined much like the fields in the “/etc/mdadm” file except that the “md_UUID” column would normally contain UUIDs instead of physical device names. These physical device names are used only during the build process to indicate to build server what physical devices are to be used to create the RAID array. In this example all the RAID devices are built from partitions of /dev/sda and /dev/sdb and are configured as mirrored RAID devices. “/dev/md0” as we have seen is used as the root partition and its size is determined from the “fstab” table described previously. The remaining RAID devices are used as physical volumes from which the various logical volumes are created. The sizes of these devices are defined in the “physicalVolume” table.

  • The “physicalVolume” table defines the devices that are used as physical volumes to support the logical volumes used by the server. The following is a sample configuration:




Note that we are using the mirrored RAID devices to provide redundancy rather than the redundancy built into the logical volume mechanism. In our example, the first physical device is a part of “vg1” and is used to provide space for the “/var”, “/diskless”, “/diskless/vars” and “/images” logical volumes defined in the “fstab” and “logicalVolumes”. The space defined in “pv_size” must be larger than the total space used by the individual logical volumes. The second physical volume device is a part of “vg2” which is used to support the logical volumes for each of the Windows 7 images listed in the WindowsImageTab. The size specified in “pv_size” must be large enough to support the existing images plus any anticipated future expansion. The final physical volume device is used as the backend for the the flashcache writable COW partitions if an SSD disk is not used. Just as in the three previously described tables, the pv_uuid field is replaced by the UUID of the physical volume by buildserver. (Note: “pvs -o pv_name,pv_uuid” command will print device names and their uuids)

  • Now that the new server entity has been created, the buildserver application can be run. Start by installing the appropriate number of empty disks required by the configuration into a computer that may or may not be the final server used. Boot this computer disklessly using the instructions in Section 2.9. To run buildserver issue the following command:

    buildserver <server entity name>

    where <server entity name> is the name of the application server that was just created. For further information on the buildserver application see Section 3.5. At the end of the build, the disks should contain a bootable image of the Windows 7 application server described by the server entity. Note that the UUID's and the actual disk partition sizes will be updated to reflect their true values.

  • Note that buildserver can be run multiple times to get bugs in the configuration ironed out. Once the new server image has successfully been built the the disks can be transferred to the server (if not already there), the BIOS can be setup to boot from the system disk and the system should respond by loading the grub menu and booting the desired kernel.

At this point you should have a Windows 7 appserver that is ready to boot. The following section describes the boot sequence in detail so that you can verify that all the steps are successfully carried out.

3.3.4 ISCSI Appserver Boot Sequence

The server is now ready to be booted. The kernel should be able to auto detect the physical disk partitions, assemble the root RAID array and mount the latter to start the process of bringing up the system. The following steps are involved with the startup of the Windows 7 iSCSI system:

  • System startup scripts initiate the sysfs file system and scan for RAID devices and physical/logical volume devices. In our example each of the 2 physical disks have been partitioned into 5 partitions 4 of which are used to make the RAID array devices /dev/md0 through to /dev/md3. “/dev/md0” is used as the root partition, “/dev/md1” is used as the physical volume for lv01 through lv04, “/dev/md2 is the physical volume used to store the Windows 7 images and finally “/dev/md3” is the physical volume used to store the COW partitions. The virtual disk layout at this point should look like this:




    The mount command should indicate that lv01 through lv04 are mounted on “/var”, “/diskless”, “/diskless/var” and “/images” respectively. The “/var” partition is the appservers /var partition, the “/diskless” partition is used to store the diskless client root partition, “/diskless/rootfs”, that is mounted via nfs, the “/diskless/vars” partition contains the “/var” partitions used by the diskless clients, one for each client, and finally the “/images” partition contains all the Windows 7 compacted images used to build the actual Windows 7 virtual disks. For each Windows 7 image there should be three files, named <windowsImageName>, <windowsImageName>.bootsect and <windowsImageName>.bootpart that contain the actual C: drive compacted image, the boot sector for the virtual disk and the compacted image of the Windows 7 boot partition respectively. The files for each of these file systems should have been put there by the buildserver process under the direction of the “rsync” table entry for this server entity. Ensure that these partitions have been created properly.

  • After these initial partitions have been set up, the flashcache script in “/etc/init.d” is run. Its function is to build the virtual disk caches for reading and writing used by the iSCSI devices. To implement the cache two components are needed, the backing stores, “/dev/md2” and “/dev/md3”, that have already been created and the high speed caching devices built by the Linux ramdisk driver. First the script determines the size of the two ramdisk caches by adding together the values stored in the iscsi_image_size and iscsi_cow_cache_size variables. Then it creates a ramdisk of the appropriate size, adds it to logical volume group vg3 and then divides it into two ramdisk logical volumes, “/dev/mapper/vg3-ram_cache1” and “/dev/mapper/vg3-ram_cache2”, one read cache for the Windows 7 image partitions and one write cache for the client COW partitions. Next the script runs the flaschcache command to create the two cache enabled virtual disks from the two backing store partitions and their respective cache devices. At this point the cache enabled virtual devices “/dev/mapper/image_cache” and “/dev/mapper/cow_cache” should have been created as shown below.






  • After the flachecache script has done its thing, the /etc/init.d/lnsysconfig script is run. It in turn starts the lnsysconfigd daemon. This daemon is responsible for configuring the managed system configuration files based on the contents of the sys_config database. One of its tasks is to ensure that the each of the Windows 7 images noted in the “WindowsImageTab” has a corresponding and up to date virtual disk associated with it. In this case it will create two new logical volumes from the vg2 physical volume “/dev/mapper/vg2-image_cache”. In this example, using the “/images/cs/cs_lab.bootsect” and “/images/cs/csi5.bootsect” files it will then set up the partitions for each of these disks. Following this, lnsysconfigd will fork off a process, syncImage, that will take the compacted images files, in this case “/images/cs/cs_lab.bootpart”, “/images/cs/cs_lab”, “/images/cs/csi5.bootpart” and “/images/cs/csi5” and use them to initialize their corresponding partitions on their corresponding disks. Once completed, there should be two new disks “/dev/mapper/vg2-cs_lab_m” and “/dev/mapper/cs_lab_m” that appear. Note that if you have any clients assigned to boot from this appserver and assigned to boot one of these images, then the creation of the Windows 7 images will fail as lnsysconfigd performs a sanity checks to make sure that there are no client computers that are using an image before syncing that image. The file system should now appear as follows:






Using the names in our example, if you want to verify that the syncImage process has been successful, you can first do a cfdisk on “/dev/mapper/vg2-csi_m” and “/dev/mapper/vg2-cs_lab_m” and both disks should have 2 partitions, one smaller boot partition and a larger partition that should be the Windows 7 C: drive. To further verify the contents of the partitions you can run the kpartx command on the disks as follows using the “/dev/mapper/vg2-csi5_m” image as an example:

kpartx -a /dev/mapper/vg2-csi5_m

This will result in the appearance of 2 new disks, “/dev/mapper/vg2-csi5_m1” and “/dev/mapper/cvg2-csi_m2”. You can then mount each of these file systems and browse their content to verify that they have been created properly. When done make sure to first unmount the partition and then run:

kpartx -d /dev/mapper/vg2-csi5_m

to remove the partition entries from /dev. If this is not done then there could be some problems when using the iSCSI driver as it will detect that the disk is in use and fail.

At this point you should have a working and configured Windows 7 appserver. There are two choices that can be made at this point depending on whether or not you already have a working Windows 7 image made for this server. If you do then you can proceed with the steps described below in Section 3.3.5. However, if you need to build a Windows 7 image then skip Section 3.3.5 and proceed to Section 3.4 that describes the windows image creation program WIMP.

3.3.5 Booting Windows 7 Client

In this section the configuration and booting of diskless Windows 7 clients is discussed in greater detail. See the Section 2.9 for the general set up of a diskless client for Linux and make sure that the clients can boot under Linux before proceeding. The following steps will enable your diskless clients to boot up under Windows 7 using the iSCSI virtual disk provided by the appserver:

  • Bring up the client entity in the “System Configuration” webtool and modify the following variables to enable booting Windows 7:

    Variable

    Value

    servsers

    name of Windows 7 appserver

    bootfile

    lnundionly.kpxe

    menu_template

    linwin_template or winlin_template

    windowsImage

    name of Windows 7 image

    iscsiPassword

    12 random letters

    ipxe_menu

    menuTemplate

    run_level_name

    ipsec

    splash_{full|wide}_screen

    Name of file used as splash screen for login.

Note that there are two choices for “menu_template”. In the first case the computer will, by default, boot up Linux whereas in the second case the computer will, by default, boot up Window7.

  • Make sure that the the BIOS of the client computer is set up to boot using the PXE ethernet boot protocol.

  • Make sure that the following processes are running on the appserver:

  • lnsysconfigd – updates the system's configuration files

  • lntftpd – does the ftp of the boot program and boot menu

  • menuserver – sets up the iSCSI targets for the diskless clients and provides configuration hints to the Windows 7 boot process

  • dhcpd – provides the IP address and transfers the name of the boot file and boot menu.

  • Start the boot process on the client computer. A PXE boot message should appear and it will wait for a response from the dhcpd server. If nothing happens make sure that dhcpd is running on the appserver and that there is an entry in “/etc/bootptab” with this client's MAC address specified. The dhcpd daemon will provide the client computer with an IP address and routing information as well as the name of the boot program that is to be used. The name of this boot program is specified by the value of the “bootfile” variable in the sys_config database entry for this client. In this case it must be “lnundionly.kpxe”.

  • Once the client gets a response back from dhcpd the PXE code attempts to download the executable bootloader “lnundionly.kpxe” by connecting to the lntftpd daemon on the appserver. This bootloader program has been customized for labnet by compiling it with an embedded boot script that triggers the following URL request upon startup:

    http://<appserver_name>/bootmenu

    Our Windows 7 appservers all run a customized web server daemon, menuserver, that accepts these requests and executes specialized code to queue tasks to create the iSCSI targets for these requestors and to send back the customized menus derived from processing the menu template specified by the variable “ipxe_menu”. The following text is an example of the menu that is returned to the client, “rook” which boots the image “cs_lab” from the appserver “megatron”:


    #!ipxe
    echo IP address: ${net0/ip} ; echo Subnet mask: ${net0/netmask}

    :linux
    imgfetch pxelinux.0
    kernel pxelinux.0
    set root-path "/diskless/rootfs"
    boot

    :windows
    imgfetch win7boot
    imgfree win7boot
    set initiator-iqn iqn.1989-06-ca.mun.cs.rook:system-disk
    set username iqn.1989-06-ca.mun.cs.rook:system-disk
    set password rookiscsipas
    set keep-san 1
    sanboot iscsi:megatron.cs.mun.ca::::iqn.1989-06-ca.mun.cs.megatron:cs-lab:rook

    Note that the initiator IQN, username, password and tartget are all provided so that this client can connect to the server megatron's iSCSI target as if were a local disk. Note that all of the particulars are provided by the templating mechanism that extracts the necessary information from the “sys_config” database in order to build this custom boot configuration file.

    Meanwhile a thread in menuserver dequeues target creation tasks one at a time to prevent congestion with the lvm code. For each request the thread creates a snapshot logical volume by dynamically allocating a COW partition and linking it with the read only master Windows 7 partition that contains the image as specified in the windowsImage variable of the requesting client. The thread then adds the resulting virtual disk to the iSCSI back end and initializes the kernel data structures that implements the iSCSI target.

  • The “lnundionly.kpxe” program then starts running the script that was down loaded by menuserver running on the appserver and fetches and runs the second bootloader “pxelinux.0”. Note that “pxelinux.0” is loaded via the HTTP protocol connecting with the menuserver daemon. Two boot loaders are required because the ipxe boot loader supports iSCSI targets but does not support the boot splash screen that allows users to select between linux or windows 7 booting.

  • The “pxelinux.0” bootloader then queries dhcpd on the appserver, gets the name of its configuration file and downloads it using the lntftpd daemon. The name of the menu configuration file is specified by the variable “menu_template” and inserted into “/etc/bootptab” by lnsysconfigd. As an example, the following is the “pxelinux.0” menu file generated for client “rook”:

    UI vesamenu.c32
    ALLOWOPTIONS 0
    MENU TITLE Operating System Choices
    MENU BACKGROUND images/image.png
    MENU NOTABMSG Use arrow keys to select choice and press ENTER key
    MENU MARGIN 25
    MENU WIDTH 78
    MENU ROWS 3
    MENU TABMSGROW 8
    TIMEOUT 100
    MENU VSHIFT 3

    LABEL window7
    MENU LABEL Microsoft Windows 7
    LOCALBOOT

    LABEL gentoo
    MENU LABEL Gentoo Linux
    KERNEL bzImage.noarch-3.7.2-gentoo
    APPEND root=/dev/nfs init=/etc/rc.d/rc.preinit.moreminimal nfsroot=/diskless/rootfs,v3
    ip=dhcp softlevel=ipsec-win

    Note that a background image is specified and is downloaded via the web server, menuservser.

    It should be noted that both menu scripts are templates and are therefore custom made for booting a specific client computer based on the contents of the sys_config database. To view the actual content of the menu scripts run the template through the templateCat utility.

  • If all goes well a splash screen will appear on the screen with a list of the boot choices. In most cases this will be Linux and Windows 7. There is a 10 second delay before the default operating system (the first operating system listed) is automatically booted. If the user intervenes using the arrow keys and selects an alternate operating system choice by typing the enter key then that system will boot instead.

  • If the Windows 7 operating system is selected then the pxelinux.0 boot loader terminates (Note LOCALBOOT option above under windows7) and returns control back to the “lnundionly.kpxe” boot loader that already has registered the iSCSI boot information with the BIOS and begins to initiate the connection with the specified iSCSI target.

  • At this point you will briefly see the following message which is replaced in a matter of seconds by the Windows 7 boot screen:

  • During the latter part of the Windows 7 boot process, an automatic login process is initiated for a generic user using a predefined profile. The boot process was implemented this way to speed up the login process and avoid generating user profiles over and over again. Next the winlogin.exe python script that has been installed as a part of the image generation process is run. It initializes the GUI interface for the startup script. It determines the screen resolution and chooses the corresponding wallpaper file that matches closest to the aspect ratio of the screen. The wallpaper itself is supplied by a URL call to the menuserver daemon. The script then executes the toggleHide function which disables the task manager and kills explorer.exe rendering the computer useless until login. On the splash screen this script then displays two input fields for username and password. It should be noted that the winlogin.exe task can be terminated by the <Alt-F9> character sequence on some non production versions. This can be used for testing purposes. It reverts the computer to a state before the script is launched.

    On the appserver side, there should be two new devices: “/dev/mapper/vg2-<computer_name>_c” and “/dev/mapper/vg2-<computer_name>_c-cow”. The former device is the actual virtual disk that is bound to the iSCSI target and the latter device is the writable COW partition into which all writes are redirected. <computer_name> is simply the client's entity name (ie /dev/mapper/vg2-hawk_c and /dev/mapper/vg2-hawk_c-cow for client named 'hawk').


    Diagrammatic representation of the components added by menuserver during client boot.

  • When the user enters his or her user name and password, the credentials are checked by attempting to do a net login to the IPC$ share located on the appserver. If successful the winlogin.exe script then executes the logon.vbs script located on the appserver using the cached credentials from the net login. This script is located on the appserver so that the login process can be customized without rebuilding images. The main task of this script is to mount the user's home directory share H: and the public share S:. The host name of the samba file server and the actual path of the home directory are determined by making a URL request to menuserver. The login.vbs script is also responsible for setting up the print shares for this computer. The user at this point should be presented with a standard windows 7 login screen. Running “My Computer” should reveal an H: drive and an S: drives that contain the user's home directory and the public share respectively. The printer shares should also be visible.

  • When users are finished with their login session they logout normally and the computer will reboot and begin again at the PXE boot process described in the fourth step above. Note that the old COW partition for this session will be automatically deleted and a new COW partition created at this point.

3.4 Creating Windows 7 iSCSI images

With the advent of the new Windows 7 iSCSI boot procedure, a new methodology was needed to prepare and build Windows 7 images. Some aspects of the process have become significantly easier due to fact that the M: drive is no longer needed to store Microsoft software. However some new problems have been discovered mainly centered around the initial loading of the image. It appears that a BIOS that is iSCSI aware is far easier to load than a BIOS that does not play well with iSCSI. WIMPng has been designed to help administrators navigate the pit falls in creating Windows 7 iSCSI images.

The home page of the new WIMPng webtool allows the administrator to either start a new session or carry on with a previously started session as can be seen from the screen shot below:




The table on the right shows the list of sessions that have been started by the webtools user currently logged in and offers the opportunity to resume the WIMP session by clicking on the desired Stored Sessions table entry. This will take you to the webpage where you left off. Otherwise the user can start a new session by clicking on the New Session button. There is an entity type called “wimpSession” that stores all the session specific information. The name of the “wimpSession” entity matches the name of the “windowsImage” entity so there can be only one active WIMP session for a given Windows 7 image. After the session has completed, the “wimpSession” entity is removed. Clients cannot boot an image using a COW partition if there is an active “wimpSession” for the given image.

If we select a stored session from the menu it will simply jump to the stage in the build process where we left off so instead we will choose to start a new session by selecting New Session button. This will allow us to see all the steps in the image creation process.



The following is a screen shot of the WIMP New Session page:




The buttons at the bottom of the screen allow us to select from 5 modes of operation. The first three modes deal with the management and creation of iSCSI Windows 7 images. The remaining two modes deal with the special case computer images. This discussion will focus on the three modes of operation of the iSCSI image management as follows:

  1. New Managed Install – In this scenario we are creating a new iSCSI image from scratch. To begin with we will need an appserver preferably not busy and a client computer that has the identical hardware as the computers that we want to image. Note that unlike Linux, Windows7 images only work properly on hardware configurations that are homogeneous. Assume that we are starting an image from scratch and push the button labeled “New Managed Install”. This brings us to the following page:


    Here we are asked to select the Windows 7 application server that will be used to support this client. After selecting the server, “megatron”, the following additions appear on the screen:


    In this screen shot the client “bunting” has been selected and will be the client on which we build the image. The clients that are available to choose from are those that megatron serves. The image name must not already exist and will be associated with the client “bunting”. The image size specifies the size of the virtual disk that Windows 7 will see and should therefore be large enough to hold all the software that you intend on loading onto the C: drive. In this case 50 gigabytes has been chosen. Finally the size of the writable portion of the COW partition is specified. This value represents the maximum number of bytes that can be written to the C: drive during a login session and in this example 2 gigabytes have been specified. Recognized size suffixes are tTgGmM for terabytes, gigabytes and megabytes.

    Note that we are not going to be using the COW partition during the creation of the image. In other words we will be connected directly to the image partition in read/write mode. Once the image has been finalized and a snapshot of the image taken, this image will be set to read only and the COW partitions will be used by each of the client virtual images assigned to this image. Continuing to the next step the following page is displayed:


    Clicking on each of the numbered entries results in an expanded view with detailed instructions on how to accomplish the listed task. We are now ready to begin the actual Windows 7 installation.

    At this point it may be instructive to go over the behind the scenes work that WIMPng has done up to this point. A new “windowsImage” entity has been created and initialized, the image name has been added to the “WindowsImageTab” on the server “megatron” and a “wimpSession” entity of the same name has been created. WIMP has also instructed the the server, “megatron” to create a Windows 7 logical volume with the size specified in the previous page (50G). A simple ls command can used to verify that the device “/dev/mapper/vg2-cs_lab2” exists. In addition WIMP has instructed the server to connect this empty virtual disk with an iSCSI target that is configured to accept a connection from “bunting”. This connection is not associated with a COW partition as we are going to be writing directly to this disk. Note that the client “windowsImage” variable is not set as this is only set once the image has been created and the client is booting from a COW partition. To verify that the above steps have been successful you can issue the lvs command and in this example the following line should appear in the list of logical volumes:

cs_lab2_m vg2 -wi-ao 50.00g

Also the “lio_node --listendpoints” command displays all iSCSI targets that have currently been created and should contain, in this example, the following lines:

\------> iqn.1989-06-ca.mun.cs.megatron:cs-lab2
\-------> tpgt_1 TargetAlias: LIO Target
TPG Status: ENABLED
TPG Network Portals:
\-------> 134.153.48.11:3260
TPG Logical Units:
\-------> lun_0/cs_lab2 -> target/core/iblock_0/cs_lab2

To ensure that the iSCSI target is configured to connect with the right client the following command can be issued:

lio_node --listnodeacls iqn.1989-06-ca.mun.cs.megatron:cs-lab2 1

Note that the ISCSI Qualified Name (IQN) argument is simply the first line of the result of the listendpoints command above. The final argument is the TPG number which in our case is always the number 1. If the target has been associated with a client but the client has not yet connected, the following lines are output:

\------> InitiatorName: iqn.1989-06-ca.mun.cs.bunting:system-disk
No active iSCSI Session for Initiator Endpoint

If the target has connected then the following lines are output:

\------> InitiatorName: iqn.1989-06-ca.mun.cs.bunting:system-disk
InitiatorAlias:
LIO Session ID: 2 ISID: 0x40 00 01 37 00 00 TSIH: 2 SessionType: Normal
Session State: TARG_SESS_STATE_LOGGED_IN
---------------------[iSCSI Session Values]-----------------------
CmdSN/WR : CmdSN/WC : ExpCmdSN : MaxCmdSN : ITT : TTT
0x00000010 0x00000010 0x0004ccaf 0x0004ccbe 0x0004ccae 0x000433fa
----------------------[iSCSI Connections]-------------------------
CID: 1 Connection State: TARG_CONN_STATE_LOGGED_IN
Address 134.153.49.213 TCP StatSN: 0xc91cd0e7

You will notice that in both cases the name of the client host is indicated in the “InitiatorName” IQN. Just to clarify the nomenclature of the IQN's, there is a unique name given to each end of the iSCSI connection. The IQN for the “target” (server) end is in this case:

iqn.1989-06-ca.mun.cs.megatron:cs-lab2

and the IQN for the “initiator” (client) end is in this case:

iqn.1989-06-ca.mun.cs.bunting:system-disk

When the the target is being modified directly by the client, as is true in the above example, the generic target IQN is as follows:

iqn.1989-06-<reverse DNS name of server>:<image name>

However, when the target is associated with a COW partition for a specific client (see Section 3.3.5), then the generic target IQN is as follows:

iqn.1989-06-<reverse DNS name of server>:<image name>:<client name>

Note that the generic initiator IQN is the same whether connecting directly to an image for installation or whether connecting via a COW partition and it is as follows:

iqn.1989-06-<reverse DNS name of client>:system-disk

Also note that “1989-06” in the IQN is the year and month that the “mun.ca” domain was registered by the Department of Computer Science.

Enough of the “behind the scenes” explanations and on to the image build instructions. Next set up the BIOS on your diskless client, in this case “bunting”, so that it will boot using the PXE boot option on the Ethernet port and set the second boot device to be the CDROM. Make sure that you have the appropriate Windows 7 install disk loaded. Next boot your diskless client. You should see the DHCP requests going out and the “lnundionly.kpxe” bootloader message followed shortly by the pxelinux.0 bootloader message and the bootloader splash screen. Select Windows 7 as the operating system. The bootloader should then print a message indicating that it is attempting to boot off the iSCSI target but since there is no OS present it will immediately fail and hopefully fall back to the CDROM device. The CDROM with the Windows install disk will then boot as per usual. At some point it will ask you if you want to perform an install to which you respond affirmatively. A list of suitable install disks should then be displayed. It is not a good idea to have an actual disk installed in the client as this can complicate the install process so make sure that the SATA cable is unplugged for now. If you see a device with the appropriate size then select it and you are good to go. If you do not see any devices this is because the system does not have the appropriate network drivers for your chip set. Select the custom driver install option, load the CDROM with the disk that contains the drivers for your motherboard and install the ethernet drivers (refer to instructions supplied by WIMPng). It is best to install all the network drivers just to make sure. The iSCSI disk should now be displayed in the install to disk list. Make sure that you reload the CDROM with the Windows install disk before selecting the disk or else you will get an unintelligible error message. Follow the standard Windows 7 install procedures including the installation of updates. If the install procedure requests that you reboot your computer go ahead and reboot it as the iSCSI target is the default boot device and should be able to boot successfully at this point. These steps are outlined in the drop down text box number 1. as follows:




If your BIOS does not fall back to the CDROM after the iSCSI boot fails you will want to go to the steps outlined in item 1.5 of the drop down text boxes.

Next follow the steps outlined in drop down text box number 2 as shown in the following screen shot:




Essentially this instructs you on how to answer the basic questions prompted by Microsoft installation software when setting up your image. When done press “Proceed to Windows Activation” button.

In this section the KMS (Key Management Service) is setup. This service is necessary because we are imaging clients and they cannot all share the same key. Essentially the clients must register each time they reboot so that they can receive the authorization to run. Normally for non imaged computers this is done once when you first install Windows but in our case it happens each time the client reboots. Clearly connecting to Microsoft for each reboot is not an option as it appears to Microsoft that you have more licenses than you have been issued. To get around this problem, Microsoft allows institutions to run a KMS server that will intercept key service requests and service them locally. The following screen shot describes the KMS procedure:




Essentially this process hard codes the location of the KMS server into the image so that requests are redirected to it rather than to Microsoft. After completing this step press “KMS Setup Complete” key.

We are now ready to install software for our image. The following screen shot provides URLs for standard programs that we add to our image:




Press the “Program Installation Complete” button when finished with program installation. It should be noted that you should not feel obliged to install all software intended for this image at this time as you can come back to this step at anytime using the WIMPng resume option at the main page.

The following screen shot outlines the steps that must be taken to enable/disable features and set policies needed by Labnet:




The instructions are self explanatory and once completd press the “Configuration Complete” button.

This step involves installing the script that is used to orchestrate the Labnet login process. This program and its associated files can be down loaded from our webtools server using the URL as shown in the following screen shot:




Note you will have to down load an unzip program if you have not already done so. When done installing the scripts press the “Continue to Image Prep” step.

In this final step we are removing traces of this session's activity from this user's profile as this will be the profile that the users will inherit by default when they login. Note the special instructions that must be performed if you are making an image from a hard disk.




At the end of this process the web page prompts for the name of the directory in which your images will be archived for distribution to other servers. If this is not set properly or you do not have sufficient privileges to access this directory then you must contact a member of our team to have the directory set up. The steps described on the following page cannot be done unless this is set up. When done press “Ready to Image” button.

The final step involves the archiving of your image and the steps are outlined in the following screen shot:




The command outlined above are run on the appserver that booted your client and therefore you must have root access to this server. The [img_name] argument is the name of the image that you have just created. This is the command that archives the newly created disk image. Note that if this is a Hard Drive First install then you will have to run the imaging command from the client itself booted into Linux with the /image/ directory mounted. Note that if you do not complete this step and anything happens to your image either on the master iSCSI partition or on the hard drive then you will have to start again. That being said, you can defer this step to a later time and, in the case of an iSCSI install, go out and boot multiple clients from the image. This is not however recommended. Press the “Imaging Complete” button.

There are two stages to the imaging process. The first stage that we have just completed, captures the image and archives it as a compacted file that is a snapshot of the whole partition. It can be used to reconstruct this partition by running the syncImage command. The following command accomplishes this task:

iscsiadmin --image <image name>

The second stage makes a copy of this archived partition on the master image server. To determine where your image files should go, look up the “rs_host” and “rs_src” variables that rsync your image directory. This is the location where your master images should be stored. The following is the generic form of the command for doing this:

rsync -a <rs_dst>/<image name>* <rs_host>:<rs_src>

For example by logging into the appserver “megatron” and issuing the following command:

rsync -a /images/cs/cs_lab2* pooky:/mnt/images/cs

we would save the new image “cs_lab2” to the master image repository given the following “Rsync” table entry for “megatron”:




Assuming that you have several appservers that share this directory, the lnrsync utility will, in the evening, refresh all these appservers with the updated and new images so that all your servers will have access to the same set of images regardless of what server you used to create them. You should now add any new image names and a “TimeStamp” of “none” to the “WindowsImageTab” table of these other appservers so that lnsysconfigd can automatically create the master image disks for these images. Once the image disks have been created/updated, these servers can also serve these images to all eligible clients. In the following screen shot note that this server has been setup to create disk images for the cs_lab2 and demo_img images when they become available after the nightly lnrsync:




Note that in the case of an updated image, the “TimeStamp” value should change to reflect the time of the last rsync. If this date is not more recent than the date on the files in the archive directory then lnsysconfigd may need to be restarted with a kill -USR1 <lnsysconfigd pid> signal to get it to reread its configuration files. A ps ax | grep syncImage command will verify whether or not the sync process has started.

Congratulations, you have successfully build an iSCSI image that can now be used to boot your lab. Assuming that your clients are already configured, all you have to do is to set the “windowsImage” variable for each client and reboot them. See Section 3.3.5 for details on configuring and booting Windows 7 clients.

  1. Update Managed Image – This selection is used to update an existing image. Again we start with a diskless client that is associated with a Windows 7 appserver that already has a copy of the image that we are going to update. It should be noted that the image must not be associated with any clients. This means that there should be no clients associated with this server that have the “windowsImage” variable set to the name of the image being updated. The process follows the general pattern as described in point 1. above.

  2. Clone Managed Image – This selection is used to make a clone of an image that is already in use for the purpose of performing updates. The update process is identical to the Update Managed Image process described in point 2 above except that by cloning the image we do not have to worry about there being any client computers associated with this image. Therefore this process could be used on an appserver that is already serving iSCSI targets to clients using the image that we want to modify. Once the cloned image modifications have been completed the clients can all be moved over to the new image by changing the “windowsImage” variable value to the cloned image name for all the clients that want to avail of it. Another advantage of the Clone Managed Image selection is that you can make the size of the partition bigger or smaller allowing you to add additional partitions or to change the size of C: drive. You will however need to do use the partimage utility to resize C: drive after it has been rsynced over. Once the initial cloning has taken place, the actual software installatio process is much the same as for updating an image.

  3. Delete Managed Image – There should be a way to delete an image?

3.5 Windows XP Imaging

A final image management class is the Microsoft Windows images. As mentioned earlier, these images can be divided into two categories, one that is copied to the client computer's hard drive and the other that is mounted as the M: shared software network drive. In general, Microsoft images depend a lot more on the hardware components, so it is necessary to create images for each different computer hardware configuration. To facilitate and standardize this process, a webtool has been developed to produce and manage the images for C: and the images for M:. The process is currently being revised for the production of Microsoft Windows 7 images, but a brief overview of our current Microsoft Windows XP process will provide a general idea of what the process entails. The following screen shot shows the first page of the WIMP webtool:




The general procedure is to start up this web tool on a computer next to the computer for which you want to create an image. It is assumed that the computer to be imaged already has a client entity created for it and that it is runs diskless Linux. The name of this client computer is placed in the Machine hostname input. As an example, the computer 'jay' has been selected as the target host to be imaged. Clicking on the Select Machine button will cause the following page to be loaded:




Note that there are three imaging options. The first option is used to image computers that are imaged whenever a user logs out. The remaining two options are used for managing images for computers that are imaged on demand. Primarily, our labs are imaged using the Install/Upgrade Image option. On the upper right hand of the screen shot, there is a list of the ten steps that are performed during the imaging process. Some of these steps are skipped if this is an upgrade.


The following is a brief description of what actions are performed during each of these imaging phases:


  1. Start – Enter the name of the machine you wish to create, restore or manage an image

  2. Select Type – Choose from one of three tasks. Create/Upgrade an image, Retrieve/Restore an image, or Delete an Image. The third step is for managing the image on the 4th partition of the client hard drive. These images are used with standalone imaging. If Upgrade is selected then you jump to step 6, Software Setup.

  3. WinXP Installation – Initial install of windows

  4. Domain Setup – Add computer to the windows domain

  5. LabNet Setup – Run registry modification and other customizations

  6. Software Setup – Install software to network drive

  7. Image Preparation – Remove temporary files and defragment the hard drive

  8. Imaging – Create image on the server or the 4th partition for a standalone client

  9. Testing – Reboot and test the image. This may be an iterative process if changes are indicated.

  10. Finalize – Accept the image for distribution


3.6 Building Servers

In general, if we assume that the system software on any given computer system is composed of a collection of binary files that are the the same across all servers, along with appropriately configured configuration files, then it should be possible to generate any one of a number of server systems simply by copying a generic software image to an empty system drive and then layering in the configuration files. This is exactly what we do when deploying a new Labnet server. In fact, we have developed a simple but effective server build program that can build any server described in our server entity database. Clearly the first step is to create a server entity that describes the specific aspects of this server by either cloning from an existing entity or by creating a new one from scratch. To clone from an existing entry select an appropriate server entity and click on the “Use as Template” button. Alternatively from the main page select the “Entity Creation” button and fill in the entity name that you want to use as a template, select the entity type and click on the “Next” button. See section 2.2 for screen shots and generic information on how to create a new entity. Note if you do not supply a template value then no template is used and you must supply all the appropriate values manually.

The following screen shot shows some of the basic information that is needed to build the application server 'gasket':



Note that there are Mandatory and Optional variables specified. The mandatory variables must all be set before the entity is considered to be “active”. If the entity is not active, it will not be propagated via the 'lnsysconfigd' daemon. The most important variables contain IP information, information about what kernel should be loaded and the server functions that are used to determine what services this sever will run at startup. The “functions” variable indicates that “gasket” is a syncserver, a caching DNS server and a backup server. This variable controls the generation of the runlevels directory structure for this server so that the proper scrips are run at startup to bring up the necessary daemons to support this server's functions. Managed file templates also consult this variable to ensure that configuration files are set up appropriately. The “ip” and “kernel”variables are used by the templates that control the boot process “menu.lst” and “/etc/hosts” file. The IP information is also propagated to the DNS servers for inclusion in the Labnet DNS.


The Optional variables are not all set as indicated by the “N/A” settings. Other variables such as the ssh_key are set by other programs so should be left alone. Other variables such as the “iscsi_xxx” variables are used only by Windows 7 application servers. In general you should make sure that the ldapsrv and ntpserver variables are set properly for your site and sambaSID if this is used to serve files for Microsoft Windows clients.



The next section of the Webtools page contains a series of the standard tables that are used to configure information of a more tabular nature. The following is a list of currently defined tables:


Table Name

Description

Access

Used by “access.conf.template” to create “/etc/security/access.conf”

BackupTable

Used directly by backupsync and bkupclean utlities to control backups

Cname_Table

Used by the lnsysconfigd utility to configure “CNAMES” for the DNS servers

Config_Files_Table

Used to assign this host to a named group for a specific file (see section xxx)

Exports

Used by the “exports.template” to create “/etc/exports”

Networks

Used by servers with more than one active network port.

Rsync

Used directly to control the rsyncing of images to the various partitions

WindowsImageTab

Used to define the Windows 7 images supported by this application server

crontab_tasks

Used by “ln_crontab.template” to create the “/etc/cron.d/lncrontab” file

fstab

Used by “fstab.template” to create “/etc/fstab” and by the buildserver utility

logicalVolumes

Used by the buildserver utility to setup the static logical volumes.

mdadm

Used by “mdadm.conf.template” to create “/etc/mdadm.conf” and by buildserver

physicalVolumes

Used by the buildserver utility to setup the physical volumes



For the purpose of building a new server we will go into more details on the following tables:

  • BackupTable – Contains the information that directs the backup process. Columns are as follows:

    1. bk_directory contains the name of the directory on this server that is to be backed up.

    2. bk_server contains the name of the the name of the server that will do the backup.

    3. bk_basedir describes the directory in which the mangles directories are created to receive the backup. The mangled directory names preserve the server and directory information.

    4. bk_on is a flag that is either on or off indicating whether or not the backup goes ahead.

    5. bk_filter references another webtool entity of type “rsync_filter” that can be used to reject various files and directories during the rsync process.

    6. bk_date is used to name the directory into which the incremental backup is to be performed and must be set to “none” when first initialized. The backup utility will set this value automatically to indicate the date and time of the last successful backup.

    7. bk_tsm_date is similarly set by the tsmbkup backup utility for files that are to be archived on tape.

    See Section 4.2 for more information on the backup system. The following screen shot shows the backup table before being processed by buildserver:


  • Rsync – Contains the information that determines what is contained in your systems basic software layout. This table directs the activity of the lnrsync utility that is responsible for keeping your system software up to date or if this is a new server it loads the initial software distribution. The columns and their meaning are as follows:

  1. rs_host is the host from which the image is copied.

  2. rs_src is the directory from which the image is copied. Note that the distribution server may support multiple images in different directories.

  3. rs_dst is the destination directory on this host that recieves the rsynced data.

  4. rs_sync is a flag that is either “yes” or “no” indicating whether or not to perform the rsync.

  5. rs_date is the date and time of the last successful rsync. It is used by the lnrsync utility to determine if the daily rsync has been successfully completed for that day. It should be set to “none” if this system is being built by buildserver.

See section 3.1 for more information in the lnrsync process.

Example of an Rsync table before configuration:



    Once buildserver has rsynced over the various partitions the dates will set as follows:



  • fstab – Contains the information required by buildserver to layout the disk structure and also is used to build the “/etc/fstab” file. You will note in the example below that there are two additional columns not available in the “/etc/fstab”. These fields, “fst_uuid” and “fst_size” are used by buildserver to locate and deterine the partition size respectively. It is necessary to initially set all fst_uuid fields to “none” as they will be initialized by buildserver. If the fst_uuid field is set to a file system UUID then buildserver will assume that the filesystem has already been created and attempt to find it. This is useful when buildserver is run multiple times as it can then skip the creation of the filesystem partitions if they have already been created by a previous pass. The rest of the fields are simply passed along to the fstab.template to generate the “/etc/fstab” configuration file. There is one caveat with respect to the fst_dev column that forms the device name column of the “fstab”. If this line is prefixed with “UUID=” then the device column of “/etc/fstab” will contain UUID=<UUID from the fst_uuid column>. This is to avoid the use of physical device names that may change with the positioning of the disk drives on the bus and cause the system to come up inappropriately. Note that logical device names can be used with impunity as they are scanned and mapped by their internal UUIDs. You will notice that the physical device name is specified for the swap device. This is for the benefit of buildserver that will be using this information to set up and format the disk systems. When this information is no longer needed it will be removed form the database. The following is an example of an “fstab” table as it would look before being processed by buildserver:



Notice that the fst_uuid values are set to none. Also note that the fst_size values are in bytes. After buildserver has successfully created and formated the file systems, the fstab looks like this:



Note that the fst_uuid fields have been initiallized for all file systems and that ethe physical device names for the swap device has been removed. You will also notice that the fst_size values have been changed to reflect the actual values selected during the partitioning process.

  • LogicalVolumes – This table describes the logical volumes that are setup by buildserver and indicates the volume group to which they belong. Once the logical volumes have been setup the lv_uuid is inserted into the table. There must be an entry in this table for every logical volume entry referenced in the fstab. Here is the LogicalVolumes table before and after being processed by buildserver:


Before

After





  • mdadm – This table is used by buildserver to set up the RAID devices and it is also used by “mdadm.conf.template” to generate the “/etc/mdadm.conf” file. Its structure conforms to the structure used by “/etc/mdadm.conf” where the md_UUID's are used to assemble the RAID array. As with “fstab”, buildserver requires that the physical (or logical) devices be specified in place of the “md_UUID” initially so a comma separated list of devices must be inserted into the “md_UUID” field prior to submission to buildserver. The following example shows mdadm table before and after processing by buildserver:






  • physicalVolumes – This table is used by buildserver to initialize the physical volumes used by the logical volume manager. The sum of the sizes specified in the “pv_size” fields for a particular volume group must be larger than the total of the sizes of the logical volumes in that volume group. Initially the “pv_uuid” field must be the name of a device that buildserver will use as the physical volume. After adding the physical volume to the volume group specified by the “pv_vgroup” field, buildserver replaces the device name with the “pv_uuid” of the device in the database. A simple example of the physicalVolumes table before and after processing by buildserver is as follows:


Before

After






Once we have completed the creation of the server entity, we are ready to start the server build. It is best if the disk drive partition tables are zeroed (cfdisk -z /dev/sd?) before starting the build and make sure that no disk partitions appear up in “/dev/”. The server is then booted disklessly by enabling PXE boot in the BIOS. The next step is to run the buildserver utility, to start the build process. In this instance buildserver takes as its only argument the name name of the entity that you have just defined. In this case we would provide the entity name 'gasket'. The following steps are carried out by buildserver:

  1. Buildserver loads the information from the fstab, logicalVolunes, mdadm and physicalVolumes tables stored in the configuration database. If for any reason inconsistencies are found in the server entity, buildserver will exit with an error message allowing the administrator to make the necessary changes to remedy the situation. This step prints the following text:

    Loading Sysconfig database information

  2. Buildserver then scans the disks looking for existing devices, partitions and file systems. This step prints the following text:


    Matching file systems with disk partitions

  1. Buildserver tries to match this information with the information loaded from the database. If files systems are found it attempts to match their UUID's with those in the “fst_uuid” variables. If device names are specified in the sys_config database then buildserver must use those physical device names to build the RAID arrays, physical volumes and partitions. Before committing partition tables to disk, buildserver will ask for confirmation. Once the partition sizes have been determined, buildserver sets the actual sizes generated after asking for confirmation. This step prints the following text:

The following partitions are about to be committed to the disk:
/dev/sdb1, /dev/sdb2, /dev/sdb3, /dev/sdb4, /dev/sdb5, /dev/sdb6
Do you want to continue with this operation? (yes|no)
yes
Disk partition: /dev/sdb1 successfully added
Disk partition: /dev/sdb2 successfully added
Disk partition: /dev/sdb3 successfully added
Disk partition: /dev/sdb4 successfully added
Disk partition: /dev/sdb5 successfully added
Disk partition: /dev/sdb6 successfully added
The following partitions are about to be committed to the disk:
/dev/sda1, /dev/sda2, /dev/sda3, /dev/sda4, /dev/sda5, /dev/sda6
Do you want to continue with this operation? (yes|no)
yes
Disk partition: /dev/sda1 successfully added
Disk partition: /dev/sda2 successfully added
Disk partition: /dev/sda3 successfully added
Disk partition: /dev/sda4 successfully added
Disk partition: /dev/sda5 successfully added
Disk partition: /dev/sda6 successfully addedAssembling meta device volumes
Failed to seek to end of disk: /dev/sda4 (Invalid argument)
Failed to read superblock from /dev/sda4 (No such file or directory)
Failed to read superblock from /dev/sda4 (No such file or directory)
Failed to read superblock from /dev/sda4 (No such file or directory)
Failed to seek to end of disk: /dev/sdb4 (Invalid argument)
Failed to read superblock from /dev/sdb4 (No such file or directory)
Failed to read superblock from /dev/sdb4 (No such file or directory)
Failed to read superblock from /dev/sdb4 (No such file or directory)
mdadm: size set to 58596992K
mdadm: array /dev/md0 started.
mdadm: size set to 976559104K
mdadm: array /dev/md1 started.
mdadm: size set to 585938624K
mdadm: array /dev/md2 started.
mdadm: size set to 324601216K
mdadm: array /dev/md3 started.
vgcreate command stderr output: No physical volume label read from /dev/md1

Command vgcreate output: Physical volume "/dev/md1" successfully created
Volume group "vg1" successfully created

Command lvcreate output: Logical volume "lv01" created

Command lvcreate output: Logical volume "lv02" created

Command lvcreate output: Logical volume "lv03" created

Command lvcreate output: Logical volume "lv04" created

vgcreate command stderr output: No physical volume label read from /dev/md2

Command vgcreate output: Physical volume "/dev/md2" successfully created
Volume group "vg2" successfully created

vgextend command stderr output: No physical volume label read from /dev/md3

Command vgextend output: Physical volume "/dev/md3" successfully created
Volume group "vg2" successfully extended

Updating file sizes in the database with actual values from the disks
Updating size from -1 to 332391645184 for disk: /dev/md3
Do you want to continue with update? (yes|no)
yes


  1. Buildserver then formats the resulting devices according to the specifications in fstab. If buildserver detects a previous files system with a UUID that does not match, it will ask for conformation before reformatting.


Formating unformatted partitions
mke2fs 1.41.9 (22-Aug-2009)

Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
3662848 inodes, 14649248 blocks
732462 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=0
448 block groups
32768 blocks per group, 32768 fragments per group
8176 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424

Writing inode tables: done Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 39 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

Setting up swapspace version 1, size = 7815616 KiB
no label, UUID=a5cb3b73-c0be-4cec-b4ae-d6f4fe6fb112
Setting up swapspace version 1, size = 7815616 KiB
no label, UUID=dbe4a553-205a-4123-9c6e-b1abb14aace9

etc. for remaining devices...


  1. buildserver will reevaluate the disk sizes and update the fst_size fields appropriately after asking for confirmation.

  2. Whether buildserver has had to build and format the partitions or whether it has found the matching file system UUIDs, it then mounts the partitions according to the specifications in the “fstab” table.


Mounting unmounted partitions


  1. Buildserver prints out the currently mounted partitions and asks for confirmation before continuing on to the loading of the partition content phase.


Summary of disk info

The following is the status of the file systems in the fstab
/dev/md0 ext4(raid) mounted on: /
/dev/sda2 swap formatted
/dev/sdb2 swap formatted
/dev/vg1/lv01 ext4(lvm) mounted on: /var
/dev/vg1/lv02 ext4(lvm) mounted on: /diskless
/dev/vg1/lv03 ext4(lvm) mounted on: /diskless/vars
/dev/vg1/lv04 ext4(lvm) mounted on: /images

Continue loading disk partitions? (yes|no)


  1. If you choose to continue, buildserver consults with the “Rsync” table to determine from where to load the partition contents. For each entry in the “Rsync” table, buildserver prompts the administrator for confirmation on whether or not to load this partition.


Rsyncing file systems from master image
Starting rsync from jon:/mnt/dbgen64prod to /
Do you want to rsync this partition? (yes|no)
yes
/usr/bin/rsync -e 'ssh -x -c blowfish' -aSx --delete --exclude-from=/tmp/excludesyb0al2 jon:/mnt/dbgen64prod/ /mnt/slash/ 2>&1
Rsync of directory / on sapphire completed
{YesNoDialogStart}
Starting rsync from jon:/mnt/dbgen64prod/var to /var
Do you want to rsync this partition? (yes|no)
yes
/usr/bin/rsync -e 'ssh -x -c blowfish' -aSx --delete --exclude-from=/tmp/excludes0z26dT jon:/mnt/dbgen64prod/var/ /mnt/slash/var 2>&1
Rsync of directory /var on sapphire completed
{YesNoDialogStart}
Starting rsync from crank:/mnt/dbgen32/diskless to /diskless
Do you want to rsync this partition? (yes|no)
yes
/usr/bin/rsync -e 'ssh -x -c blowfish' -aSx --delete --exclude-from=/tmp/excludes3VdgiVcrank:/mnt/dbgen32/diskless/ /mnt/slash/diskless 2>&1
Rsync of directory /diskless on sapphire completed
{YesNoDialogStart}
Starting rsync from crank:/mnt/dbgen32/diskless/vars to /diskless/vars
Do you want to rsync this partition? (yes|no)
yes
/usr/bin/rsync -e 'ssh -x -c blowfish' -aSx --delete --exclude-from=/tmp/excludes9iYaka crank:/mnt/dbgen32/diskless/vars/ /mnt/slash/diskless/vars 2>&1
Rsync of directory /diskless/vars on sapphire completed
{YesNoDialogStart}
Starting rsync from pooky:/mnt/images/cs to /images/cs
Do you want to rsync this partition? (yes|no)
yes


  1. Buildserver next configures the boot sector. It asks for confirmation before updating the boot sector.

    Setting up the boot sectors
    Do you want to setup the boot sector? (yes|no)
    yes
    Filesystem type is ext2fs, partition type 0x83
    Checking if "/boot/grub/stage1" exists... yes
    Checking if "/boot/grub/stage2" exists... yes
    Checking if "/boot/grub/e2fs_stage1_5" exists... yes
    Running "embed /boot/grub/e2fs_stage1_5 (hd0)"... 17 sectors are embedded.
    succeeded
    Running "install /boot/grub/stage1 (hd0) (hd0)1+17 p (hd0,0)/boot/grub/stage2
    /boot/grub/menu.lst"... succeeded
    Done.

  2. Buildserver builds an internal memory copy of the sys_config database as well as an XML filesystem based persistent version. The internal memory database representation allows for high speed updates to template files whereas the persistent XML representation is used when the system starts up and needs access to configuration information. The following output is generated during this process and shows the creation of the directory hierarchy used to support the XML configuration files:

    Building custom configuration
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/client
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/server
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/standalone
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/printer
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/domain
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/dns
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/labnet
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/testing
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/client_run_levels
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/monitor
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/graphics_card
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/mouse
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/fileMgr
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/ldap_compare
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/ldap_sync
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/nms_info
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/rsyncImage
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/rsync_filter
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/service
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/nms
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/wimpSession
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/windowsImage
    Missing entity type directory: /mnt/slash/usr/local/etc/labnet/software
    Building the XML config files.

  3. Then buildserver loads the managed files and processes the managed file templates.

Loading all custom configuration files
Creating missing template directory: /mnt/slash/etc/templates
Creating missing master configuration dir: /mnt/slash/var/lib/labnet/master_config
Removing managed file: initiatorname.iscsi with no instance.
Failed to unlink the managed file: initiatorname.iscsi
Adding missing sysmlink: /mnt/slash/etc/linkdept
Failed to open /mnt/slash/win98/netlogon/logon.vbssiL99x for writing
Failed to open /mnt/slash/etc/php/apache2-php5/php.iniq5YmbR for writing
Failed to open /mnt/slash/etc/request-key.d/id_resolver.confT2ixPQ for writing
Processing all templates


  1. Buildserver checks to see if the certificate is defined and if not it generates a certificate request and sends it off to the Labnet certificate signing authority to have it signed. The resulting certificate is stored in the appropriate location on the server.


Making server certificates
Creating missing private rsa key directory: /mnt/slash/etc/ssl/private
Creating missing private key file
Generating RSA private key, 1024 bit long modulus
.........................++++++
..............++++++
e is 65537 (0x10001)
Creating missing certificate file


  1. Buildserver performs custom server setup as dictated by the “functions” variable values.


Performing specific server cusomizations
Processing appserver configuration
Missing client_type variable value for server: loader
Processed appserver configuration


  1. Buildserver then sets up the runlevels directory heirarchy according to the “service” entities and the values of this server's function variable.


Updating the system runlevels
Failed to open target directory /mnt/slash/etc/runlevels/sysinit
Reason: No such file or directory
Creating target directory.
Adding new runlevel application link: udev
Adding new runlevel application link: dmesg
Failed to open target directory /mnt/slash/etc/runlevels/default
Reason: No such file or directory
Creating target directory.
Adding new runlevel application link: rpc.idmapd
Adding new runlevel application link: racoon
Adding new runlevel application link: atd
Adding new runlevel application link: smartd
Adding new runlevel application link: autofs
Adding new runlevel application link: lnmenuserver
Adding new runlevel application link: sshd
Adding new runlevel application link: samba
Adding new runlevel application link: postfix
Adding new runlevel application link: lnsysconfig
Adding new runlevel application link: nfs
Adding new runlevel application link: ntp-client
Adding new runlevel application link: dhcpd
Adding new runlevel application link: xinetd
Adding new runlevel application link: udev-postmount
Adding new runlevel application link: ntpd
Adding new runlevel application link: vixie-cron
Adding new runlevel application link: lntftp
Adding new runlevel application link: net.eth0
Adding new runlevel application link: lprng
Adding new runlevel application link: rpcbind

Adding new runlevel application link: local
Failed to open target directory /mnt/slash/etc/runlevels/nonetwork
Reason: No such file or directory
Creating target directory.
Adding new runlevel application link: local
Failed to open target directory /mnt/slash/etc/runlevels/shutdown
Reason: No such file or directory
Creating target directory.

Adding new runlevel application link: killprocs
Adding new runlevel application link: mount-ro
Adding new runlevel application link: savecache
Failed to open target directory /mnt/slash/etc/runlevels/boot
Reason: No such file or directory
Creating target directory.
Adding new runlevel application link: urandom
Adding new runlevel application link: procfs
Adding new runlevel application link: hostname
Adding new runlevel application link: keymaps
Adding new runlevel application link: swap
Adding new runlevel application link: termencoding
Adding new runlevel application link: flashcache
Adding new runlevel application link: modules
Adding new runlevel application link: sysctl
Adding new runlevel application link: serial
Adding new runlevel application link: device-mapper
Adding new runlevel application link: mdraid
Adding new runlevel application link: net.lo
Adding new runlevel application link: hwclock
Adding new runlevel application link: bootmisc
Adding new runlevel application link: fsck
Adding new runlevel application link: lvm
Adding new runlevel application link: lnbootlogger
Adding new runlevel application link: consolefont
Adding new runlevel application link: syslog-ng
Adding new runlevel application link: root
Adding new runlevel application link: mtab
Adding new runlevel application link: localmount


  1. Buildserver checks the ssh keys and if necessary creates an ssh key for this computer and sets the public key in the sys_config database for distribution to all Labnet hosts.


Checking ssh keys
Creating a new key file.
Generating public/private rsa key pair.
Your identification has been saved in /mnt/slash/etc/ssh/ssh_host_rsa_key.
Your public key has been saved in /mnt/slash/etc/ssh/ssh_host_rsa_key.pub.
The key fingerprint is:
c3:e7:52:15:c0:33:ad:9a:f5:11:1f:c8:be:a5:b1:3e root@sapphire.lnesd.mun.ca
The key's randomart image is:
+--[ RSA 1024]----+
| ..+.. |
| + =.. |
| =.o . |
| . o.+ o |
| S+o. B |
| o= = |
| . .. |
| . E |
| . |
+-----------------+



  1. Finally buildserver puts some essential special device files into the “/dev” directory that must be there to boot.


Initializing /dev directory
Finished configuring server
Build finished



Assuming that buildserver terminates correctly, the computer can now be rebooted. Make sure that the BIOS is set to boot off the system drive before starting the boot process. The computer should now come up as the configured server with the appropriate services configured. Of course you will need to initialize the data for some functions since buildserver will not create database content, web content or LDAP content.


Once the build has finished, the server can be set to boot from the hard drive and the server will boot up with the functionality enabled in the configuration database. Of course, the build server program does not copy in web pages, set up databases or automatically set up the data for an LDAP server. It is expected that, if this information already exists in some backup, then it can be copied into place using a simple rsync command or these applications can be initialized to begin to receive new data.



It should be noted that the build does not have to take place in the VLAN of the target server. In this case the server and/or disks will need to be moved to the appropriate VLAN before the server will properly boot. This is a handy technique for building servers for VLANs without the capability of booting diskless clients or simply out of convenience. We use a client computer with hot swap drive bays to build quickly servers on demand.



It should be apparent that by using these in house designed imaging utilities tightly coupled with our configuration database, every aspect of our software environment has come under our management infrastructure. In Section 4.1 we will turn our attention to how Labnet has facilitated the management of the user's data.

3.7 Network Monitoring

To run a large installation, it is necessary to have a “bird's eye” view of the general operating parameters of all servers. To this end we modified our configuration server daemon so that it would also report status information to a central database at 5 minute intervals. Only changes in status trigger changes in the database although one of the parameters, the CPU load average, was selected as a heart beat to ensure that the daemons are running. Because the data is being logged to a database, not only do we have a current record of the status for all servers, but we also have an historical record of all the monitoring status. This is vital for evaluating long term performance trends for the planning of upgrades or for detecting failing hardware.



To view the network monitoring data, a webtool was developed that would provide an overview of the status of all the servers, colour coded for the severity of the problem. Also a count of the number of problems is displayed. The following diagram illustrates a typical overview of all computer science servers as displayed by our webtool:






Note the colour code descriptions. You will also note that there is a ghosted out “client” button at the top. This will be activated to show client information in the future. Also there is also a domain selector drop down menu button at the top so that the NMS display can be limited to only the servers of interest, in this case, the Department of Computer Science. If we are interested in more details about computer “guernsey” then we would click on the name and the following table would be displayed:




You will notice that the NMS information is letting us know that there is a problem with one of our RAID arrays affecting /dev/md0, /dev/md1 and /dev/md3. Once a new disk has been added to the array and the array has been synced, the display will revert to a green colour and show a status of active. The following categories of server status are collected:



  • daemon – shows a list of the daemons that this server has been configured to run and indicates whether they are running or not.

  • diskinodes – shows the percentage of the inodes that are consumed on the indicated file system.

  • diskspace – shows the percentage of the disk space that has been consumed on the indicated file system.

  • fsck – indicates wheter or not the file system will do a file system check on reboot.

  • loadaverage – indicates the current load average averaged over each CPU.

  • mdstat – indicates the status of the RAID arrays

  • mounts – indicates whether or not the file systems in the fstab are currently mounted.

  • netrx – indicates the percentage of the maximum bandwidth for this Ethernet port that is being used for incoming packets.

  • nettx – indicates the percentage of the maximum bandwidth for this Ethernet port that is being used for outgoing packets.

  • uuids – indicates whether or not the UUID's of all the file systems match the UUID's specified in the database.

The status of other activities such as backups and server image syncing can be viewed using the tools described elsewhere in this document. This tool has been found to be invaluable for trouble shooting system problems. Often more serious problems are averted by periodic checks of our NMS tool.



The “warning” and “critical” status levels can be configured using the system configuration database entity type “nms”. Each of the nms categories described above have a corresponding entity describing the ranges or values that are considered warnings or critical. The following is a sample entity describing the “loadaverage” metric:






Note that new entities for network monitoring can easily be created by simply developing a utility for logging status information to the NMS database and by creating an entry in this table.

4 Managing the User's Data

4.1 Labnet Software for User Support

Providing a secure, reliable and conveniently accessible place for our users to store data is one of the primary goals of Labnet. For this reason, it is very important to protect the security and integrity of the users' data and provide for disaster recovery. The areas shown in yellow in the following diagram illustrate the areas in which Labnet tools have been used to augment the general administrative practices for management of the users' data store and authentication:



4.2 User Authentication

The LDAP server used for Labnet authentication is autonomous from Memorial's authoritative LDAP server so that temporary accounts can be created for visitors or special teaching sessions. To administer special Labnet accounts, there is a webtool specifically designed to allow modification and creation of Labnet accounts. For individuals with an entry in Memorial's authoritative LDAP server, the account can be flagged as 'linked' and the ldapsync daemon will keep specified attributes of the authoritative LDAP server synced with the Labnet LDAP server. In this way, our user community can avail of a campus wide single sign on for accessing a wider range of computing resources. Our master Labnet LDAP server is replicated on several additional servers for redundancy using the syncrepl protocol.


The user community is divided into two groups, one group that uses Labnet client computers for their primary computing platform and the other group that uses their own personal computers and laptops as their primary computing platform but wish to avail of the data storage and printing resources of Labnet. For Microsoft Windows users that have non Labnet computers, there is a special VB script called 'Nomad' that can be downloaded to provide automatic connection of the user's home directory share and Labnet remote printing services. The following screen shot shows what this application looks like:




Faculty and staff members use this feature to automatically backup research data from their personal computers. Additional programs can be downloaded by Labnet users to perform automatic desktop backups or desktop to Labnet synchronization so that changes made to either directory hierarchy are reflected in the other. These packages are DeltaCopy and Unison File Synchronizer respectively and they will work in Linux, MacOs and Microsoft Windows environments.


Users who log into a Labnet client are presented with a home directory as their primary data storage area. For Microsoft users, this means that they would see familiar Microsoft folders such as My Documents on their H: drive. Linux users would see the same directory, but would tend to use different folders specific to Linux. Users who login from their own personal computers (from Linux, Microsoft or Mac) would be presented with the same home directory, but as a network share mapped as a drive/directory of their choosing. In either case, security of our user data is provided by our redundant LDAP servers that authenticate users by using the LDAP bind command either on a SHA-1 hash for Linux users or by using a MD4 hash through samba for Microsoft users.


4.3 Ldap Webtools

4.4 User Backups

Every evening, backupsync is launched on all our backup servers and, under the guidance from information contained within our configuration database, it performs incremental backups to local disks using rsync's --link-dest option. This results in files that have not been changed to be hard linked to the previous backup copy. Only changed files are copied to disk. Thus, unless the directories have been rearranged, each daily backup appears as a complete directory tree identical to the original directory tree as it was on the day of creation. The following diagram illustrates this process:





A webtool exists that allows users to extract their files from this backup tree. The feature can be accessed from the webtools application 'Retrieve Backup Files' and displays the following top level page that allows the user to select the desired backup date:





After selecting the date, the following page shows how the user can download a file from their backup:





By clicking on directory names, the user can drill down through the directory structure to the location of the desired file. By clicking on the file name, the following screen shot illustrates the dialog box that is presented to the user for choosing how to handle the backup file:




Since the selected file has been identified as an OpenOffice file, the user is offered the choice to simply load the file into the OpenOffice application directly or to save the file to disk in the user's home directory. If the user selects the latter option, the following dialog box is presented to allow the user to specify the directory and name in which to save the retrieved backup file:




4.5 The Backup Utility

Not only user data, but system data is also backed up on a daily basis. The whole process is mediated by fields in the system configuration database. On each of our backup servers, cron initiates the backup process by calling the backupsync utility. This process and others can be programmed to start at any given time or frequency by setting up the crontab_tasks table for the corresponding server in the system configuration entity as follows:



This crontab_tasks entry, taken from the entity description of one of our backup servers, will cause the backup program to start at 00:04 and shorty after, at 00:10, the image rsync utility is started. At 12:20 in the afternoon, the backup clean utility is started to free up disk space by removing the oldest backups, making room for the next nightly backup. The -m 90 argument determines the high water mark in terms of percentage of used disk space. In this example, the disk will be freed until at least 10% of the disk space is available for the evening backup. This table is used to create the /etc/cron.d/ln_crontab file.


For each server, there is a Backup_Table entry that governs the backups for this server. The following screen shot of one of our file servers shows the contents of this table:




In this example, the file server has 6 user directories, /users/labnet2/fs1-6, that are backed up daily on the backup server 'bumper' into the base directory /backup1/users for fs1-3 and /backup2/users for fs4-6. The backup directory names are a mangled form that contains the name of the source server and directory. In this case, the mangled name of /users/labnet2/fs1 is:


/backup2/users/carme.pcglabs.mun.ca_users_labnet2_fs1


This directory contains a series of directories having names that correspond to the date that the backup was performed. For example, the backup done on May 24, 2010 would be located at the following pathname:


/backup2/users/carme.pcglabs.mun.ca_users_labnet2_fs1/May24_2010


The directory tree based in the above directory would be an exact replica of the directory /users/labnet2/fs1 on carme on May 24, 2010. The bk_date column displays the date of the last successful backup. Similarly, the bk_tsm_date displays the last successful backup date of the user directories on the Tivoli Storage Management tape backup device.


To get an overall view of the backup status, the 'View Backup' webtool is used. The following table shows the overall backup status for all of our servers, color coded to indicate status:





Clicking on any one of the server names will bring you to a more in depth view of the backup status on a per directory basis. The following screenshot illustrates what we would have seen by clicking server 'carme' on June 9th:





Based on the information in this screenshot, we can see that the user file systems on 'carme' were successfully backed up by bumper and that the system 'var' partition was successfully backed up by the backup server 'crank'. Other useful information includes that amount of available disk space that is available on each of the server partitions.


The backup status can also be viewed from the perspective of the backup server as shown in the following screen shot:





4.6 Labnet User Printing

1