An Automated Approach to System Administration
Computer System Manager
Department of Computer Science
Labnet is the culmination of 6 years of work aimed at automating the management of computer labs within Memorial University using Windows XP, Windows7 and Linux. Labnet is the most recent incarnation of a long history of automated lab management schemes that we have put in place for the university over the past 20 years. This work has been largely the efforts of the systems programming staff of the Department of Computer Science with assistance from the Department of Computing and Communications and the Department of Student Affairs (Housing). It has grown from 60 workstations located in a couple of labs within the Department of Computer Science to include 800 workstations and 60 servers located in 28 labs in virtually all buildings on campus, including the university residences and library.
When the Department of Computer Science was asked to design a software system to manage general access computer labs at Memorial, a great deal of thought went into formulating an appropriate design. The following criteria were established as the guiding principles in all our subsequent design decisions:
dual boot of Linux and Microsoft operating systems
guaranteed system software integrity
reliable and robust in the event of hardware failures
provides a uniform set of services to all clients where possible
ensures the security and integrity of the users files and data.
easy to administer and maintain (plug and play).
hierarchical authority management realms
Four years later a system has emerged that is a finely tuned embodiment of these guiding principles. Each of the following sections will briefly elaborate on how Labnet was able to achieve these goals.
Dual Boot Option
Labnet is based on Linux, a highly configurable open source operating system that can be easily modified to automate system administrative tasks. However many of our users are more familiar with Microsoft operating systems such as Windows XP and Windows'98 so many Labnet client computers allow users to choose either Windows'98 or Windows XP instead of Linux. The newer labs offer Microsoft Windows XP whereas the older labs offer Microsoft Windows'98. The latest version of Gentoo Linux is available on all Labnet computers along with a choice of X window managers. On high end computers more fully featured window managers can be selected, while on low end computers light weight window managers can be selected so as not to compromise response time.
To understand how system software integrity can be guaranteed, some brief explanation of the inner workings of Labnet is necessary. A useful feature of Linux is that it can be configured to start up and run without a hard drive and Labnet uses this feature to ensure system integrity. Because there are no hard drives, there is no need to do local software updates and installs. Instead, all the software is installed and updated on a single application server that can provide software for an entire lab. Lab computers cannot modify the contents of the server and users cannot login or otherwise access the server. By storing the software on the server and by protecting the server in this way, we are able to guarantee the software integrity of the entire lab. Appendix II goes into more detail on how diskless Linux client computers boot.
Imaging Microsoft Operating Systems
To run a Microsoft operating system, the computer must have a hard drive. Typically, using a hard drive introduces management nightmares involving software maintenance, viruses, security hacks and the application of patches. To circumvent these problems, Labnet uses Linux to repair the image on the hard disk each time a user logs off. Typically this process takes less than a minute to perform. By the time the next user sits down to use the workstation, the image on the disk has been restored to its original pristine state. As a side effect, all operating system viruses are eradicated automatically with no human intervention or any anti-virus software. Again the software integrity of the lab can be guaranteed. Appendix III explains in more detail how the Windows XP/98 images are deployed on the client computers.
Reliability and Robustness of Labnet
To prevent a single point of failure from crippling one of our labs, Labnet makes extensive use of redundancy. Typically we use redundant disk technology (RAID) so that a single disk failure does not cause loss of data or down time. If a server should ever fail, we also have a spare computer on hand in each of our major computer rooms that can take the place of a failed server. All our server disks are mounted on drive trays that can easily be replaced allowing full service to be restored within minutes.
Because Labnet software is so uniform across all computers, the diagnosis of hardware problems is swift and accurate. Typically a service technician must reinstall the system software before assessing whether the problem is of a hardware or software nature. Since all Labnet computers share the same image, reinstallation of software is never required to diagnose hardware problems (See Appendix I). Software problems will be experienced on all systems, while problems with individual computers usually indicate a hardware problem. This drastically cuts down on the number of technical calls and the time required to service the call.
Since problems are inevitable in any computer network, procedures must be in place to ensure the early detection of hardware and software problems. Labnet uses a web-based network monitoring service known as Nagios to monitor the status of all the computers under our jurisdiction. With timely intervention we are able to fix most problems from our office, even though the computers affected may be across campus. This is a real asset in our widely distributed environment.
Labnet also performs automated backups on a daily basis, so even in the event of a catastrophic system failure, recovery of system programs and users files can be readily achieved. Daily 'snapshots' of our file systems are automatically stored on large disk arrays every night using a two month rotational incremental backup scheme. Essentially, every evening a snapshot of all the files that have changed during the incremental period is made on our backup disk system. (See Appendix V)
Labnet Standard Services
Labnet is based on one of the popular Linux distributions, Gentoo. It offers a rich set of academic software packages some of which are excellent alternatives for most types of commonly used commercial software. For example Gentoo has software development packages and programming languages that we use in virtually all computer science courses. It also has word processing and presentation packages, spreadsheet packages, several SQL databases, mathematical plotting and simulation packages, statistical packages, geographical demographics packages, Photoshop-like image editing systems, html viewing and editing tools, and many more. More than 1800 packages are currently installed in our Gentoo Labnet image.
Labnet servers can be configured to provide a number of useful network services such as email and web serving. To change the role of a server, only a small configuration change is needed. Administrators do not have to spend valuable time installing software to bring up these new services.
The Linux Terminal Server Project
In labs where computers are old and not capable of running the latest Microsoft or Linux applications, the Linux Terminal Server Project, LTSP, can breathe new life into these old computers. LTSP can also be used in computer labs that work primarily in Linux but must run some Microsoft applications from time to time. LTSP client computers can run Microsoft and Linux applications from more powerful remote servers. The application acts as if it were running on the local computer except that it is much faster. This also means that Microsoft applications can be run on old computers without having to purchase software licenses for lab computers. Only software licenses for the server must be purchased.
Labnet provides network accessible printing services to over 35 laser printers on campus which can be transparently used by Windows and Linux applications. Students using our wireless access points can print to any of these printers using Nomad, a locally developed script for mounting printer and file shares. Labnet also has a printing cost recovery mechanism for laser printer use. The system incorporates a cashless payment mechanism based on the cash chip located on student ID's. This software was developed locally to meet our specific needs and includes features such as statistical reports, exemptions, paper quotas and refunds, to name just a few.
Home Directory Service
Labnet provides each user with a home directory so that local work can be saved and accessed later from any other Labnet computer. The same files can be accessed from either Linux or Windows.
Many students are now bringing their laptop computers on campus to use in the classroom and in the Library. MUN offers wireless network access to these users and as a value added feature, they can access their Labnet home directory and any Labnet printer. Access is provided by Nomad, a VB script utility that can be down loaded from the Labnet home page. It asks the user for a user name and password and then mounts the home directory or printer as a share on the local computer. (See Appendix VIII for Nomad screen shot)
Home Directory Backup Service
Home directories are backed up every night to large RAID disks using an incremental type of backup scheme utilizing the rsync Linux utility. This scheme stores the files in a standard Linux file system format that can be browsed using standard file access utilities. This allows us to provide a web based backup retrieval program so that students can retrieve their files from backup without operator intervention., They simply log into the backup webtool, select the date and then select the desired file. The web browser then asks them whether to save the file and under what name. (See Appendix VI for more details and a web shot)
Uniform Working Environment
Labnet provides an extremely uniform and stable environment for the users so that they do not waste a lot of time adjusting to the peculiarities of differently configured computers. By storing each user's personal configuration preferences with their personal files, we are able to provide a consistent and customized computing environment for that user on any Labnet computer.
Scalability of Labnet
The software, running on all the client workstations, is centrally controlled and updated from a single server that cannot be tampered with. The software for this server can be managed and updated from a central site. All other servers that control individual labs and office computers are automatically updated from this master server. Where we have a large number of servers, we have several sub-master servers that are synchronized from the master and they in turn aid the master in updating the rest of the servers. This hierarchical approach can easily be scaled to any size network (See Appendix I)
Limitations in Microsoft Windows Scalability
There are some limitations on the scalability of Windows imaging. The Microsoft operating system is not very accommodating to variations in computer hardware. Unlike Linux, Windows images must be exposed to each variation before the image will work on a particular group of computers. If the computer hardware in a lab is homogeneous, then a single Microsoft image will be sufficient but, unless the variations are minimal, separate images are required for each different computer lab. Microsoft Windows images are large, so minimizing the size of each image and the number of images is important. The process of maintaining these images can demand a great deal of work from a system administrator, particularly if there are many images to maintain.
To facilitate the creation and maintenance of Windows XP images a web tool has been developed to step a system administrator through the process of image creation or update. The web tool not only provides instructions on how various software packages are to be installed but also runs scripts that take a snapshot of the C: disk and store it on the server. The web tool then activates the new image for subsequent reboots.
Non Proprietary Solutions
From the beginning our philosophy has been to stay away from proprietary hardware and software solutions for a number of economic and practical reasons. First, becoming locked into a particular vendor's software or hardware solution can be expensive and very difficult to move away from once you become dependent. All Labnet servers and clients are based on inexpensive off-the-shelf computer components. If one of our lab partners decided to opt out of Labnet they would still be able to use the computers in their labs as Microsoft Windows boxes. Similarly all the Labnet software on the Linux side is non proprietary, including all the locally written software. This means that Labnet partners are not locked into expensive software licensing agreements or software vendors.
Concern is sometimes expressed that Linux is only open source and could disappear at any time. What most people do not understand is that Linux, and the software that surrounds Linux, has been developed by hundreds of thousands of dedicated software developers around the world. These software developers have worked together rather than against each other in producing a wide range of software products based on the efforts of those that have gone before. Many of the software standards have their origins in the open source community. For example the World Wide Web protocol HTML was developed and pioneered by the CERN research center in Geneva. Commercial companies have now been established to support open source projects for commercial and governmental applications. Open source is in no danger of dying and is, in fact, destined to evolve and supplant inferior commercial products.
The Labnet software on our master server is updated on a daily basis from one of several Gentoo Linux distribution centers. This ensures that the latest bug fixes and patches are integrated into the image on a regular and timely basis. These changes are propagated to all our production servers on a daily basis, or immediately, depending on the urgency of the update. Because our master server is not in a production role and is not readily accessible from the network, the integrity and security of our system software can be guaranteed. Even if one of our production servers was to become corrupted, the master server would be able to repair the corrupted files.
Labnet provides a single authentication procedure that authenticates users based on a login name and password known only to the user. Whether logging into Linux or Windows the same procedure is used. Labnet uses a centralized password database based on LDAP (Light weight Directory Access Protocol) that allows for centralized account management for the 13,000 users accessing our 800 client workstations.
The user's home directory share is mounted automatically using samba whenever the user logs in under Microsoft. Similarly the user's home directory is mounted automatically by the Linux automounter whenever a user logs in under Linux. In either case all user files are protected by the Linux file protection scheme that allows the user to control access to their files.
Plug and Play Administration
The great success of Labnet throughout the university has been largely due to the ease with which computer labs can be set up and administered. A centralized database of configuration information lies at the heart of Labnet A web interface allows system administrators to enter configuration information for new computers or to change existing computer configurations. (See Appendix VII for screen shots.) Each server runs a daemon that polls the master configuration database every minute and checks for changes. If changes are detected, the daemon propagates the changes to the appropriate configuration files and restarts affected services. Services such as DHCP, DNS, SSHD, NFS etc are all configured from this master database. Not only are server and client computers configured from the master database but also printers and standalone computers. For example, once the template configuration for a group of client computers has been added to the master database, all that is needed to bring the computers on line is to plug them into the network and give them a name. The rest is automatic, including the partitioning and formatting of the hard drive, the generation of security keys, the creation of a local /var partition and loading of the Windows software. All the software is centrally controlled and updated from application servers that cannot be tampered with.
All the application servers in turn are managed automatically from a master application server. The master server automatically configures and updates all the application servers whenever software is added or upgraded on the master server. Software changes can be propagated to remote servers from a centralized site without human intervention. This model allows for the rapid deployment of hundreds of client computers and multiple servers easily and effectively.
Hierarchical Realms of Authority
Many of the administrative functions described above are performed by web based tools that hide the system complexity from the local administrator and users. These web based tools provide users and administrators with the ability to manage the the computer in a controlled manner without fear of compromising the security or integrity of other Labnet systems. We currently support hierarchical realms of authority that allow varying degrees of privilege for the performance of various privileged administrative functions in a controlled and safe manner. The following administrative realms are currently supported:
Web based administrative functions that can be accessed by all users include:
personal backup retrieval
resetting personal passwords
personal print job removal.
Display of printer account balance and stats
Web based administrative functions that can be accessed by student assistants include:
removal of any job from any print queue
creation, renewal, password setting of accounts with student card
Web based administrative functions that can be accessed by departmental system administrators include:
adding and updating departmental computer/printer configuration in the master configuration database.
control of printer refunds and exemptions
ability to update user LDAP information without student card.
Web based administrative functions that can be accessed by privileged system administrators include:
access to all fields of the LDAP authentication directory service.
ability to set new variables and set administrative privileges for users of the master configuration database
It should be noted that departmental administrative functions are restricted to the designated department. With this hierarchical approach to authorization, administrative functions can be judiciously assigned based on the experience of the user and requirements of the position.
Cost Effectiveness of Labnet
In assessing the total cost of ownership of a computer installation, one of the most expensive components is the cost of technical support. The goal of Labnet, from its inception, has been to automate the processes involved with managing large computer installations. Recall that the following processes are automatically performed by Labnet: client software installation, server software consistency check, server and client software updates from the master server, network monitoring and backups. In addition the following processes can be administered from the control center: software patches and updates to the master server, the installation of new software and diagnosis of remote hardware and software failures. Since hardware errors are more readily apparent with the Labnet infrastructure in place, this saves on unnecessary trips to remote labs. It is apparent that vast savings can be achieved through using this automated approach.
For an example of one small way in which Labnet has saved money, all we need do is look at the costs involved in cleaning up after a campus virus attack. Each virus outbreak on average costs the university about $40,000 in technical support for non Labnet computers. Although Labnet accounts for about twenty percent of the computers on campus, it has not been affected by any of these viruses.
The use of open source software has also resulted in big savings in software licensing fees. Many of the Labnet computers run only Linux and cost the university absolutely nothing in software costs. By using the LTSP software the Department of Computer Science is able to meet the limited demands for Microsoft software from a single Windows 2000 server. Many labs that do provide Microsoft applications run only a minimum number of Microsoft applications and rely on Linux for the rest.
By reusing old computers as LTSP devices, hardware thought to be obsolete can be rejuvenated, thereby providing more cost effective hardware utilization.
Our printer cost recovery software took in $65,000 revenue from January 1, 2005 to January 1 2006 and has not only covered the cost of consumables but has also covered the replacement cost of printers as they wear out.
It is interesting to note that even though the services of Labnet are superior, the overall costs of running Labnet are extremely low so there are no trade offs in running Labnet
Managing a large computer installation is becoming an increasingly difficult task as operating systems become more complex and network attacks become more sophisticated. The number of highly skilled computer systems administrators is limited and their salaries are sometimes high. Labnet has allowed the expertise of the Department of Computer Science to be utilized for the greater good of the whole campus at little or no extra cost to the university. Support staff in the various departments serviced by Labnet are free to engage in more productive work or research now that their computers are more stable. When they do have problems, they have a group of experts to call upon. New computer labs have not required additional support staff due to the problem-free operation of the labs. Labnet technology has also been successfully applied to the office environment. Within the Department of Computer Science and the Department of Mathematics and Statistics most of the faculty and staff are using Labnet in their offices. This service was offered as a voluntary upgrade and many faculty and staff have opted for Labnet due to its reliability.
It should be noted that there are two diametrically opposed paradigms at work here and system administrators are often faced with tough choices. One is the historical centralized computer installation model that requires that people access a central server for all computing needs. This model is advantageous because the computing resources are located centrally with the necessary management expertise. On the down side it can be very expensive and is not always user friendly or viable for general user access. The second paradigm is based on the distributed computing model that became popular with the advent of the cheap desktop computer. As the desktop environment becomes more advanced, the demands for technical support escalates. It has become obvious that the total cost of ownership is ever increasing despite the affordability of desktop hardware.
Labnet provides a middle ground solution between centralization and distributed computing. By using the best ideas from both models, users are provided with access to their own computer with a personal home directory but system administrators benefit from centrally managed software and user files. We feel that this approach is the way of the future and should be actively pursued by companies and institutions that have a large number of computers.
The only downside to using Linux is that there are some types of software applications that are not yet available and there is a lack of support for some hardware devices so Linux users have to shop carefully before buying scanners, video cameras, printers, etc. With IBM, Novell and other major companies now actively supporting Linux, there will be more applications ported to Linux and Linux will become a more viable operating system for workplace desktop computing. The hardware manufacturers will soon follow suit, providing proper Linux drivers for their hardware and Microsoft will no longer dominate the desktop software market.
First and foremost I would like to acknowledge the efforts of the open source community who have provided such a rich and diverse selection of software for everyone to work with in an uninhibited and free fashion. Without their contribution there would be limited choice in the computing industry today. I would also like to thank the many students and staff members of the Department of Computer Science, the Department of Mathematics and Statistics, Student Housing and the Department of Computing and Communications for their tireless efforts in bringing about the success of Labnet
Automated Server Software Management Strategy
All software running on Labnet client and server computers is imaged in one way or another. The table below illustrates how the master server images are synchronized to the various servers in a hierarchical fashion. It should be noted that our master server actually has several master images for different computer architectures. For example we currently have a production image for i585 computers, a development image for i586 computers and a development image for AMD64 computers. The distribution of images is implemented hierarchically so that the process can be scaled to include large numbers of servers. The scripts that disseminate images are controlled by our master configuration database server that assigns each server to a syncgroup. The synchronization process is implemented using the rsync utility. A list of exclude files is supplied to rsync so that custom files on each server are not touched by the rsync process.
The master configuration database is a repository for all system configuration information. Whenever a new server is built, the master configuration database is consulted to customize the configuration files on the new computer. Once a server is operating, a daemon, lnsysconfigd, is run that constantly polls the master database checking for updates. If a relevant update is detected, the server then updates the relevant configuration files using the information supplied by the master configuration database and then restarts any affected processes.
It should be noted that the differences in functional behavior of the servers in tier 2 and 3 of this diagram are not based on there being differences in the software but rather as a result of differences in the configuration files that enable and disable the functionality of a particular server.
Some servers function as application servers and contain additional file systems. These servers must contain the images for the client computers as well as their own server image. These client images are synchronized using rsync and have their own specifically excluded files.
The figure above shows the additional file systems that are required for the support of diskless clients. The diskless Linux client application file system is the file system that all client computers mount via NFS when they boot. The client computers mount this partition as their root directory and it is mounted read only so that users cannot modify the image. Each client computer must have a writable /var partition so there is an additional partition devoted to individual computer /var partitions. The name of the /var partition mount point is /diskless/vars/<hostname>.
The diskless clients also mount a repository of Microsoft Windows XP and Windows'98 images. For dual boot computers, one of these images will be synced to the local hard drive. The Windows XP network share is mounted up read only by the WinXP startup scripts and it is used to store the bulk of the Microsoft software.
Automated Client Software Management Strategy
The client computers, those computers used by the end users, all boot from the network using the PXE boot protocol. When a computer is reset or turned on it starts the PXE boot process by sending out DHCP requests. After the DHCP server has responded with the required information, the PXE boot protocol then uses the tftp protocol to download the second stage of the boot program. In this case it is a Linux boot loader and it further downloads the Linux kernel image from the DHCP server which is always the application server assigned to that specific client. The boot parameters such as hardware address, kernel, IP address and boot menu are all configured via the master configuration database. (See screen shot in Appendix VII.)
Once the Linux kernel starts running, it in turn mounts the root partition from the application server and starts the system initialization scripts. One of these scripts checks to see if this computer is a dual boot computer and if so, it starts the image synchronization process with information stored on the application server.
When users log into a Linux session the home directory is mounted through NFS to the users home directory server elsewhere on the network. When a user logs into a Windows XP session then the Windows XP network application share is mounted from the application server as M: and the users home directory share, H: is mounted. A public share, P: is also mounted from another network server. When the user logs off at the end of the session, the computer does a reboot and the PXE boot process repeats.
To enhance the boot speeds we typically use gigabit connections from the servers while connecting to the clients at at 100 megabits. We are currently testing port aggregation and jumbo packet strategies to further enhance the performance of Linux on our diskless client computers using gigabit switches. In essence the Ethernet is being used as a bus for the disk so high performance is an asset. We tend not to use the application servers for other tasks as they can dedicate their memory resources to providing a large memory cache for the read only blocks that are requested by the client computers. With a large enough memory cache, the application servers do a surprisingly small number of disk accesses.
Just as with the servers, all the localized client configuration is controlled and managed from the master configuration database.
Imaging Microsoft Windows XP
The process of imaging the Microsoft Windows XP image is started as soon as Linux starts running. This ensures that a pristine copy of the Windows XP image has been loaded on the computer before the next user sits down to the workstation. For this reason it is essential that this imaging be very fast, ie less than 1 minute. To do this, a number of strategies have been employed.
The first is that the local hard drive image is kept relatively small, on the order of a couple of gigabytes. This means that we have to direct install scripts to install the bulk of the software on the M: network application drive.
The second strategy is to copy only those blocks that are actually in use. This is accomplished by the imaging program that is aware of Microsoft file formats.
The third strategy is to copy only those blocks that have changed. This is accomplished by precomputing checksums for all active blocks and storing the checksums on the application server along with the master image when the image is being generated. Whenever the client computer reboots and starts the imaging process, the client computer computes the checksums of the active blocks and only copies blocks from the master image on the server if the checksums do not match. In most cases this can easily be done in under 1 minute in parallel with the other Linux startup procedures. Usually by the time the X server displays the login screen, the windows image has been completely synchronized.
The Login Process
Logging into a Labnet session is simply a matter of selecting the session type, Linux or Windows XP and typing in a user name and password. The session type can be selected by clicking in the lower left of the greeter window. The user name is typed in the location shown. The greeter then prompts for the password.
Note that there are some more advanced options on the menu bar at the bottom of the screen. The session menu list allows the Linux user to further select from a series of window managers. This can be useful when moving from a low end computer to a high end computer. One can easily select the window manager that best matches the capabilities of the computer.
Automated User Backup Services
Labnet user file servers are automatically backed up on a nightly basis to backup servers with large RAID arrays. This process is mediated by scripts that extract information from our master configuration database. Typically we have a 1:5 ratio of user file space to backup space. The files are copied to the backup servers using the rsync utility invoked in a special way designed for doing incremental backups.
In phase I of the rsync process, a clone is made of the previous directory hierarchy using hard links to represent the files in each directory. Note that this means that the only additional space used is that which is used by the directories themselves. In the second phase of the rsync process, changed files are replaced by the newer counterparts in the live file system and the hard links are broken. Unmodified file links are left untouched. This technique effectively creates an incremental file system backup that looks to all intents and purposes as a live file system. Because files can be easily located, we have developed a web interface that can retrieve files from the backup directories without operator intervention. As a further precaution, the backup servers are located off site.
User Backup Recovery Web Tool
Labnet offers users the opportunity to recover files from our on-line backups using an easy to use web interface. The users first log into the web tool using their user name and password and then they are presented with the following page that contains a list of backup dates that represent the backup dates for their home directory.
By clicking on the desired date (usually the most recent) the web tool presents the user with a graphical view of their home directory that looks something like the following screen shot.
The user can descend through the directory hierarchy until the desired file is reached and then, by clicking on the file, a dialog box is displayed that will allow the user to save the file to a desired location or input it directly into an application.
Note that there is no easy way to restore a large number of files using this web tool and therefore we are occasionally called upon to restore directory trees for users.
Screen Shots of Web Tools for Managing Labnet
The following screen shot shows one of the pages of the web tool for updating the configuration of a Labnet computer. Note that the configuration information is stored in a variable value pair notation that is highly extensible. To change the boot kernel for the selected host the administrator simply selects the desired kernel from the drop down list or adds it to the text window and clicks the Submit Changes button.
Note that changes are automatically propagated to the client and server computers by daemons running on the servers.
The lab assistants can manage many of the day to day user account management issues by using an account management web tool. A page from this web tool is shown below:
The bar across the top provides access to other commonly used features. A more sophisticated interface is provided for administrators.
Screen Shots of Nomad Share Mounter
Many students are now bringing their laptop computers on campus to use in the classroom and in the Library. MUN offers wireless network access to these users and as a value added feature, they can access their Labnet home directory and any Labnet printer. Access is provided by Nomad, a VB script utility that can be down loaded from the Labnet home page. It asks the user for a user name and password and then mounts the home directory or printer as a share on the local computer. Below is a screen shot of Nomad user interface: