Follow Us on Twitter
by kishi on April 06, 2007 04:14pm
In my last blog called “Why Manageability Matters” I talked about why we chose to work on “Systems Manageability” as a whole and get a grassroots understanding of it within the context of Linux and Open Source space. In this blog, I’m going to address the Methodology and Ontology of the Systems Manageability project. This will shed immediate light on how we approach, design and implement projects in the OSSL. Let’s start with the main goals and purpose behind the project.
I. Systems Manageability Project Goals:
Once we defined what we needed to get into, yet another realization dawned on us, which was the sheer size and volume of the data and information that was staring at us in the face. Let’s just say “overwhelmed” was a mild word compared to what we were looking at. My colleague, Steve Zarkos and I immediately realized that it was time to trim the scope of what we were doing and limit ourselves to what’s achievable in three months and with two people J. This called for drawing up what we considered to be “out-of-scope”, which was:
II. Out-of-scope:
III. Systems Manageability Project Methodology:
The approach taken for the project was simple and scientific. The project was divided into three stages:
IV. Systems Manageability Project Ontology (classification):
The hardest and most challenging aspect of the project was to develop some sort ontology, characterization or classification of the manageability technologies prevalent in the IT environments today. The diagram below represents the overall "buckets" defined as part of this exercise. Each section of the diagram is broken down to provide a detailed breakdown of each of these Systems Manageability classifications represented:
In the next blog to follow, I will break down the first segment of ontology i.e. “Provisioning and Deployment” and discuss our research with all of you. Meanwhile, we always look forward to hearing from you, our audience and urge you for any feedback you may have about the topic. Thank You for tuning into Port25.
Cheers!
by billhilf on April 09, 2007 02:09pm
When we started Port25 a year ago we certainly had no idea how it would turn out. The process of creating, launching, and evolving Port25, I believe, has many similarities to an OSS project: small group of motivated individuals, loosely coupled development model, organic growth, and meritocratic guidance and leadership. These things helped us quite a bit as we broke some new ground for Microsoft and I’m proud of where we are at one year in.
Our goals list growing an online community, but let me tell you a little secret, Port25 was just as important for inside Microsoft as it was outside. Giving Microsoft employees like Sara Ford, Steve Marx, Bruce Payette, Mike Hines (and others) a forum to talk about how their work relates to the community is part of the internal goal. Showing other Microsoft employees how we can have open and real conversations and even debates with technologists in the OSS community is part of the internal goal. Providing a place for critical analysis and learning of software built in different development models is part of the internal goal. And, one that I that I haven’t shared with anyone until now, showing people at Microsoft and in the OSS community that we have to keep all this damn stuff in perspective: it’s important, but it is just software after all.
I’ve spent a long time in open source and commercial software development and businesses. Over these years I have seen positive evolution across the board. Port25 is part of a journey for Microsoft, and we are learning with each and every step. Thank you for listening, participating and creating.
-Bill
by kishi on May 25, 2007 04:18pm
Background: This is Part 4, continuation of the series of 8 blogs I’m doing on Systems Manageability. In this specific blog, I will focus on and explain the second part of the “ontology” which is “Systems Configuration”
Level-Set: System Configuration and Management encompasses all tasks related to the configuration of a host in a standardized and (when possible) centralized way. Many projects in this category provide a common configuration interface, either command-line or GUI-based, designed to ease typical administrative tasks. Other projects, specifically Cfengine, provide a higher level policy-based system to provide consistent configuration and state management for a set of systems. Again, in this case there’s lots of different tools out there that can be used but we have focused on the most popular ones such as Webmin, YaST, SSH, VNC and Cfengine. In the paragraphs to follow, we have attempted to lay out our understanding of these tools after using them in the OSSL:
I. WEBMIN: "Webmin is a web-based interface for system administration for Unix. Using any browser that supports tables and forms (and Java for the File Manager module), you can setup user accounts, Apache, DNS, file sharing and so on." Webmin is very modular in design, allowing third-party developers to add support for a particular service or task relatively easily. Many of the tasks involve easing or automating system administration tasks, or editing a configuration file using a specific syntax. Webmin is currently supported by OpenCountry, a company that sells Linux management solutions. The OpenCountry website includes information about Webmin, including two variations of the system that they support.
II. YAST: YaST (Yet another Setup Tool) is an OS installation and configuration utility used primarily in SUSE-based systems. YaST typically serves as the primary control panel interface in, and can be used for a number of configuration tasks – such as adding and removing software, patch management, user management, device configuration and for configuring individual services and daemons. Other common administration tasks such as obtaining system information and reading server logs is also possible via the YaST interface. All of the aforementioned YaST features are implemented as modules, each of which provide a specific functionality or perform certain tasks. These tasks typically involve editing one or more text configuration files on the system in a specific format to configure a specific service or daemon. On other Linux or UNIX-like systems, these tasks are typically performed manually via the command-line.
The YaST utility is very modular in its design, allowing Novell or other third-party providers to add modules into the YaST interface to configure a particular device or service. Many of these modules work independently of each other, and as such are often packaged as individual RPM packages that may be added or removed depending on the software and devices that are installed on a system. YaST modules are written using a scripting language specific to YaST called YCP. Other scripts, such as Perl or shell scripts can also be utilized via a YaST module to perform a particular task. A CIM module for YaST is also distributed with SLES10, which provides a client interface for CIMOM (Common Information Object Manager) to other YaST2 modules. It seems the most common administration task for which YaST is used involves setting up individual package repositories (discussed further in the Patch Management and Maintenance), adding or removing software packages and configuring or initiating online updates. YaST is capable of searching for and locating software on remote repositories, retrieving the software packages, resolving package dependencies, checking the cryptographic signature of the package (if available) and then installing the software on the system. Multiple repositories can be configured. Repositories can be located on a hard disk or CD/DVD, or on a remote system obtainable via HTTP(s), FTP, NFS or CIFS. Once a repository is configured it can then be indexed for later searching. The software search functionality is very powerful, allowing one to search for appropriate software packages using many of the attributes available in the RPM package header – such as the description or contents of the package. Besides software management, the quality and completeness of many YaST modules varies. Many modules (such as the log viewing modules) offer minimal functionality, and only work well enough to provide a few basic configuration options. Complex server configurations will therefore still require one to edit text-based configuration files by hand, or use another configuration engine for the task, such as Webmin. However, many other common tasks, such as configuring display settings or a printer, can be done entirely via YaST.
III. SSH/SCP/SFTP: SSH (Secure SHell) is likely the most widely used remote administration tool for Linux and UNIX-based systems. The typical SSH toolset includes the SSH client and server, as well as the SCP and SFTP client applications for copying files, both of which simply utilize the ssh binary on the backend. The following excerpt is from the OpenSSH project home page: "OpenSSH is a FREE version of the SSH connectivity tools that technical users of the Internet rely on. Users of telnet, rlogin, and ftp may not realize that their password is transmitted across the Internet unencrypted, but it is. OpenSSH encrypts all traffic (including passwords) to effectively eliminate eavesdropping, connection hijacking, and other attacks. Additionally, OpenSSH provides secure tunneling capabilities and several authentication methods, and supports all SSH protocol versions. Since most any task can be performed via the command-line, the OpenSSH utilities are likely the most critical component for a Linux administrator to have available. The remote copy and command execution options allow one to build, deploy and run a script on a number of machines relatively quickly and securely. OpenSSH is typically installed by default in most Linux-based distributions, although in some distributions the server may by default be disabled or blocked by the firewall. "Most common uses of SSH are:
IV. Cfengine: “Cfengine, or the configuration engine is an autonomous agent and a middle to high level policy language and agent for building expert systems to administrate and configure large computer networks. Cfengine is designed to be a part of a computer immune system. It is ideal for cluster management and has been adopted for use all over the world in small and huge organizations alike.” Cfengine consists of a userspace application called cfagent and a host of other utilities that reads and parses a series of text configuration files and performs tasks on the host system based on the configuration. The configuration syntax of Cfengine is actually a high-level policy language that allows cfagent to test the system’s configuration and perform corrective actions based on those tests. For example, cfagent may test to assure that a certain line of text exists within a configuration file, and if not it will add the text and restart the associated service. The cfagent utility is typically run on an hourly (or so) basis via cron, a task-scheduling application. This assures that mis-configurations will be found and corrected within a reasonable time frame.
The policy simply tests to make sure an entry for user root exists within the /etc/shadow file, and also checks to make sure the password matches. This assures that all systems have the same password for the root user. The configuration of Cfengine can become very complex, which would likely not surprise those who have had experience with the tool. The structure of the policy language eases this dilemma a bit, as platform definitions can be made and inherited by other blocks to help determine the appropriate action to take. The configuration is essentially a high-level policy language, and thus the various tests must be built and scripted manually. The toolset is, however, enormously powerful when implemented correctly. But as with many open-source technologies, the learning curve can be quite steep, and one must study the complexities of the tool before it can be competently used in a production environment. A version of Cfengine has been ported to the Windows platform to run under Cygwin.
editfiles: # We have different passwords for lab systems and workstations. linux.shadowpasswords.md5passwords.(!workstations):: { /etc/shadow SetLine "root:$1$383J33RL$ XXXXXXXXXXXXXXXXXXXXXX:12984:0:99999:7:::" AppendIfNoLineMatching '^root:.*' LocateLineMatching '^root:.*' ReplaceLineWith “root:$1$383J33RL$ XXXXXXXXXXXXXXXXXXXXXX:12984:0:99999:7:::” } linux.shadowpasswords.md5passwords.workstations:: { /etc/shadow SetLine “root:$1$gcGWA0qS$YYYYYYYYYYYYYYYYYYYYYY:13027:0:99999:7:::” AppendIfNoLineMatching ‘^root:.*’ LocateLineMatching ‘^root:.*’ ReplaceLineWith “root:$1$gcGWA0qS$YYYYYYYYYYYYYYYYYYYYYY:13027:0:99999:7:::” } Example Cfengine policy to check the password for the root user.
editfiles: # We have different passwords for lab systems and workstations. linux.shadowpasswords.md5passwords.(!workstations):: { /etc/shadow SetLine "root:$1$383J33RL$ XXXXXXXXXXXXXXXXXXXXXX:12984:0:99999:7:::" AppendIfNoLineMatching '^root:.*' LocateLineMatching '^root:.*' ReplaceLineWith “root:$1$383J33RL$ XXXXXXXXXXXXXXXXXXXXXX:12984:0:99999:7:::” } linux.shadowpasswords.md5passwords.workstations:: { /etc/shadow SetLine “root:$1$gcGWA0qS$YYYYYYYYYYYYYYYYYYYYYY:13027:0:99999:7:::” AppendIfNoLineMatching ‘^root:.*’ LocateLineMatching ‘^root:.*’ ReplaceLineWith “root:$1$gcGWA0qS$YYYYYYYYYYYYYYYYYYYYYY:13027:0:99999:7:::” }
Example Cfengine policy to check the password for the root user.
The following example Cfengine policy checks for the existence and the contents of the /etc/cron.d/yast2-online-update file for SUSE systems. If necessary it creates the file, and writes a cron entry into the file to schedule a daily check for updates and patches. Upon completion, it then runs the command “/etc/init.d/cron restart” as defined in the suse.restartcrond definition.
editfiles: suse:: { /etc/cron.d/yast2-online-update DefineClasses "restartcrond" Umask 077 AutoCreate BeginGroupIfNoLineMatching "^.*[\s\t]+root[\s\t]+online_update" AppendIfNoSuchLine "30 3 * * * root online_update" EndGroup } shellcommands: suse.restartcrond:: "/etc/init.d/cron restart" Example Cfengine policy to assure that SUSE systems check for updates daily.
editfiles: suse:: { /etc/cron.d/yast2-online-update DefineClasses "restartcrond" Umask 077 AutoCreate BeginGroupIfNoLineMatching "^.*[\s\t]+root[\s\t]+online_update" AppendIfNoSuchLine "30 3 * * * root online_update" EndGroup } shellcommands: suse.restartcrond:: "/etc/init.d/cron restart"
Example Cfengine policy to assure that SUSE systems check for updates daily.
And that does it for the “Systems Configuration” section As always, please let us know if you found the above mentioned useful and any comments/feedback you may have. Thank you for tuning into Port25.
by Sam Ramji on May 31, 2007 05:36pm
I got the chance to meet many extremely smart developers last month at SambaXP, the annual Samba developer conference. After attending I’m convinced that the Samba team knows more about how Windows networking works than most Microsoft developers.
One of the most informative sessions I attended was led by Dr. David Holder, an expert on IP networking and Windows/Linux interoperability. Specifically, he focuses on the IPv6 protocol, implementation, and interop, where he sees great opportunities for improved service levels in a range of applications and environments, but also sees a coming wave of interoperability problems between IPv6 implementations on various platforms.
He’s done some very slick stuff in getting Samba to work with Windows Vista and Longhorn’s IPv6 stack, which is encouraging, and lays out a roadmap for future interop work between the platforms.
We are posting the link to his slides along with this podcast of his interview, and David will be available to answer questions posted to the comments section of this page.
Cheers,
Sam
Links:
Dr. Holder's SambaXP “Vista and Samba with IPv6” presentation:
samba-and-vista-with-ipv6v2.pdf
Details regarding how to IPv6 enable Samba4:
http://www.ipv6consultancy.com/ipv6blog/?p=12
Attachment: davidholder.mp3
by Paula Bach on June 12, 2007 07:46pm
Bryan has previously blogged about the project partnership between the Penn State University (PSU) College of Information Sciences and Technology (IST) and the Open Source Software Lab (OSSL). I am at the OSSL here at Microsoft this summer and next as a research intern. The project, which started in May 2007 and will last two years, is my dissertation research. I work with Jack Carroll in the Center for HCI at Penn State. I am a third year PhD candidate and I study HCI in open source software development.
In this blog I want to talk about interdisciplinarity and multidisciplinarity. Broadly speaking, the information society is like the Wild West and many challenges as well as opportunities, especially with information technologies, have arisen. So for example, the Internet is like the Wild West of the information society. Challenges and opportunities in a new frontier are exciting for business and academia at once. Understanding the challenges and opportunities, however, needs new ways of investigating. A single discipline can address some of the challenges and opportunities, but complex problems, especially ones involving the intersection of information, people, and technology can benefit from expertise from multiple approaches. This is where a multidisciplinary or interdisciplinary approach can be helpful. Rogers et al (http://rizzo.media.unisi.it/page2/assets/Rogers_Scaife_Rizzo.pdf) make the distinction between interdisciplinary and multidisciplinary:
Interdisciplinary usually means “the emergence of insight and understanding of a problem domain through the integration or derivation of different concepts, methods, and epistemologies from different disciplines in a novel way.” Multidisciplinary can be characterized as “a group of researchers from different disciplines cooperate by working together on the same problem towards a common goal, but continue to do so using theories, tools, and methods from their own discipline, and occasionally using the output from each other’s work.” The characterizations differ in whether elements of a discipline are coupled or decoupled.
Although both terms have been used interchangeably, the subtle differences in problem solving depend both on the kind of problem a team of collaborators is solving and on the investigatory skills of the team members. The OSSL takes both approaches to both the challenges and opportunities inherent in understanding the open source and where Microsoft fits in. This broad approach is inherent when comparing Microsoft’s past and current missions: A computer on every desktop and in every home running Microsoft software compared to To enable people and businesses throughout the world to realize their full potential. The missions shifted from technology-centric to people and organization-centric. This new approach includes a global perspective on key aspects of the information society: people, information, and technology. This new approach is also exemplified by a new type of academic unit called information schools, or iSchools. The joint project, looking at HCI in open source software development, is interesting from a number of perspectives in the space of information, technology, and people. My approach is interdisciplinary, taking a number of concepts and methodologies and combining them in using different epistemological perspectives. Please contact me if you would like details on the interdisciplinary nature of the study of HCI expertise in open source software development—it would be too long to expound on here.
Bryan and I recently went to the iSchool at University of Washington to talk to graduate students and faculty about the project. The research conversation, as it is called, was well attended especially for a sunny Friday afternoon at the end of the spring semester. (The iSchool dean even showed up!) We talked about the challenges of studying the open source community and about doing interdisciplinary research in an iSchool.
The most interesting aspect of my experience so far as part of this joint partnership is that I am doing interdisciplinary academic work in a business unit studying open source software development at Microsoft – all of which are normally ”separate worlds” (academic/business and Microsoft/open source software). My summer here will entail collecting data and analyzing results of HCI expertise in open source software development as well as looking at HCI expertise in software development internally at Microsoft as a basis for comparison. In this summer series, look for my blog entries as I ponder results from the studies.
by Sam Ramji on June 15, 2007 04:38pm
I had the opportunity to sit down with Javier Soltero, CEO of Hyperic last month in San Francisco at the OSBC. We had a great discussion, which I opened bluntly by saying, “You don’t need to tell me about your software; I’ve seen it, my lab team thinks it’s cool, and we’re impressed.” He was happy to hear it but probably not surprised.
One of the obvious pros of the open source model (like the freeware model of the 90’s) is that you can get what you want without calling anyone or firing off a “please contact me” request to the company’s sales department. Another equally obvious pro is that prospective customers can really walk through the product’s architecture and actual implementation to make sure that the marketing promises (“marketechture”) actually line up with the product being described.
Kishi Malhotra and Stephen Zarkos – the OSSL’s experts on manageability – did a comprehensive teardown of Hyperic and a range of other open source management technologies (such as Nagios and OpenPegasus), which they’ll be posting in the next few days. What they found about Hyperic is that it does a great job of making a low-footprint, easily adaptable management technology and is commercializing it in an open source model. We thought that SIGAR, their agent API, was particularly clever.
Javier and Doug MacEachern (their CTO, and a maintainer for mod_perl among other achievements) spent some time on a podcast with me last week – if you’re interested in hearing their reasons for building Hyperic, how it compares to Nagios, and what they learned in taking their product open source, listen in. They’ll be available to answer questions on this post as well – leave a comment if you’re curious about something they’re doing.
Also, drop us a note and let us know if you interested in more interviews with open source and interoperability technology leaders on Port 25.
by kishi on June 21, 2007 12:16pm
Background: This is Part 5, continuation of the series of 8 blogs I’m doing on Systems Manageability. In this specific blog, I will focus on and explain the third part of the “ontology” which is “Monitoring”
Level-Set – Monitoring: Monitoring and other data collection tools are an essential component of any management strategy. The proper collection and organization of host data allows for manual and sometimes automated reactive corrective measures. This section outlines many of the open source and free software monitoring tools available on the Linux platform. Much of the analysis in this section is focused on the inner workings of these tools as data collection systems, rather than feature comparisons between the various monitoring applications. The WBEM/CIM overview has been placed in this section due to its basis as a data collection and management system, even though its use is not limited the confines of this category.
I.WBEM/CIM: The following section includes an overview of the WBEM initiative and the open-source CIM implementations that exist today. The Distributed Management Task Force (DMTF) classifies WBEM (Web Based Enterprise Management) as the following:
“[WBEM is] a set of management and Internet standard technologies developed to unify the management of distributed computing environments. WBEM provides the ability for the industry to deliver a well-integrated set of standard-based management tools, facilitating the exchange of data across otherwise disparate technologies and platforms.”
Core components and industry standards used in WBEM include CIM, CIM-XML, CIM Query Language, SLP (Service LocationProtocol, for WBEM Discovery) and WBEM URI (Universal Resource Identifier) mapping. The DMTF has also developed a WBEM Management profile template for the purpose of systems manageability. WBEM has been designed to be compatible with all the major existing management protocols, including SNMP, DMI, and CMIP. There are several open source implementations of WBEM including OpenWBEM, WBEM Services, OpenPegasus and SBLIM. These are discussed in more detail below. Additionally, there are both client and server implementations available for the WBEM standard:
“provides a common definition of management information for systems, networks, applications and services, and allows for vendor extensions. CIM’s common definitions enable vendors to exchange semantically rich management information between systems throughout the network. It is a conceptual information model for describing management that is not bound to a particular implementation. This allows for the interchange of management information between management systems and applications. This can be either "agent to manager" or "manager to manager" communications that provides for Distributed System Management.”
CIM includes two components; a specification and a Schema.
WBEM (CIM) Architecture Diagram
OpenPegasus:
OpenPegasus is an open-source implementation of the DMTF CIM and WBEM standards being driven under the auspices of The Open Group. OpenPegasus is open source and is licensed under the MIT open-source license. The distribution is available via CVS, and as snapshot images in tar, zip, and (self-extracting) exe file formats on the OpenPegasus web site. Based on documentation posted on the site, simply put, Pegasus is an open-source CIM Server for DMTF CIM objects. It is written in C++ and includes the Object manager (CIMOM), a set of defined interfaces, an implementation of the CIM Operations over HTTP operations and their cimxml HTTP encodings, and Interface libraries for both clients and providers. It is maintained to be compliant with the DMTF CIM and WBEM specifications with exceptions noted in the documentation. It is designed to be portable and modular. It is coded in C++ and translates the object concepts of the CIM objects into a programming model. Pegasus is designed to be inherently portable and builds and runs today on most versions of UNIX(R), Linux, and Windows. OpenPegasus includes the following components:
OpenWBEM On SLES10:
OpenWBEM is included in SUSE Linux Enterprise Server 9 and 10, allowing any WBEM enabled management console to access configuration information on the system. A CIM schema and a MOF compiler are also included as packages in SLES9 and 10, which can be used to create and import the schema.
## Create the namespace called /root/cimv2 SLES10:/etc/openwbem # owcreatenamespace -n /root/cimv2 Creating namespace (/root/cimv2) ## Import the CIM schema. SLES10:/etc/openwbem # owmofc /usr/share/mof/cimv2.12/cimv212.mof [ ... Lots of Output ... ] Compilation finished. 0 errors occurred. Compiling and Importing the CIM Schema ## Start the OpenWBEM Daemon. SLES10:~ # /etc/init.d/owcimomd start Using common server certificate /etc/ssl/servercerts/servercert.pem Starting the OpenWBEM CIMOM Daemon done ## Check the status of the OpenWBEM service. SLES10:~ # /etc/init.d/owcimomd status Checking for service OpenWBEM CIMOM Daemon running Starting the OpenWBEM Service on SLES10
## Create the namespace called /root/cimv2
SLES10:/etc/openwbem # owcreatenamespace -n /root/cimv2
Creating namespace (/root/cimv2)
## Import the CIM schema.
SLES10:/etc/openwbem # owmofc /usr/share/mof/cimv2.12/cimv212.mof
[ ... Lots of Output ... ]
Compilation finished. 0 errors occurred.
Compiling and Importing the CIM Schema
## Start the OpenWBEM Daemon.
SLES10:~ # /etc/init.d/owcimomd start
Using common server certificate /etc/ssl/servercerts/servercert.pem
Starting the OpenWBEM CIMOM Daemon done
## Check the status of the OpenWBEM service.
SLES10:~ # /etc/init.d/owcimomd status
Checking for service OpenWBEM CIMOM Daemon running
Starting the OpenWBEM Service on SLES10
II. NAGIOS: Nagios is a system monitoring application designed to monitor remote hosts and applications over a network. The application provides a web-based graphical display that allows one to view the status of nodes and particular applications running on the nodes. The following is an excerpt from the Nagios documentation listing some of Nagios’ feature set: Some of the many features of Nagios include:
Nagios can poll servers and obtain data in a number of different ways. The most straight-forward method is to connect to a remote system directly and test to see if the host is available or if a particular service is running. Data internal to the host, such as free memory or processor usage, however, must be gathered using the Nagios agent, SNMP, another custom script or program or a Nagios plug-in called check_by_ssh - which is a standard plug-in designed to run a command on a remote machine and collect the output. The configuration of Nagios is done entirely via text-based configuration files. Hosts and other resources are defined inblocks, which can also inherit information from other pre-defined blocks, making complex configurations possible and more manageable. There are several third-party applications available that provide a web or other GUI interface to assist one with configuring Nagios, but these were not tested for this project. The following configuration block defines a generic host template called “linux-server”. Many of the configuration values such as “24x7” and “workhours” are actually defined in other configuration blocks within the Nagios configuration. This allows administrators to define custom names to a specific time period, such as “workhours”, and use that definition in other parts of the configuration.
define host { name linux-server use generic-host check_period 24x7 max_check_attempts 10 check_command check-host-alive notification_period workhours notification_interval 120 notification_options d,u,r contact_groups admins register 0 }
define host {
name linux-server
use generic-host
check_period 24x7
max_check_attempts 10
check_command check-host-alive
notification_period workhours
notification_interval 120
notification_options d,u,r
contact_groups admins
register 0
}
Nagios Host Definition Template
Individual hosts are defined in configuration blocks. Below is a sample configuration for an individual host called management. Notice the use statement is inheriting other definitions from the previously defined generic template mentioned above called “linux-server”.
define host { use linux-server ;Name of host template to use. host_name management alias Management Server address 10.197.173.100 }
use linux-server ;Name of host template to use.
host_name management
alias Management Server
address 10.197.173.100
Finally, hosts may be organized into logical groups for easier management. The following is a hostgroup that defines a group that includes five hosts.
define hostgroup { hostgroup_name test alias Test Servers members localhost,management,www,rhel4-production2,network }
define hostgroup {
hostgroup_name test
alias Test Servers
members localhost,management,www,rhel4-production2,network
Nagios is distributed with a wide assortment of plug-ins that can be used to obtain data or check a particular service. Plug-ins are distributed as a separate package which must be installed with both the server and the agent if an agent is to be used. The Nagios plug-ins are simply stand-alone executable programs, each of which can perform a particular task and return a result code for each service or subsystem being tested. Since plug-ins are individual scripts or binary programs, they often will accept different arguments to change their behavior and what information they return. The command usage of each plug-in must be defined individually within the configuration files using the define command syntax. Some plug-ins can accept multiple options which can be customized when writing the configuration for a particular system. The define command definition provides a sort of usage template so that Nagios will know how to run the command later. Luckily for new users, the default sample configuration files already provide accurate definitions for the default plug-ins. Once one is familiar with how commands are defined, however, new commands or custom scripts can also be defined here as well.
NRPE: is the Nagios Remote Plugin Executor that is installed on a remote host. It is designed simply to execute Nagios plugins on behalf of the Nagios server and return the results. The same plugins that are installed on the server must then be installed on the remote host for NRPE to utilize. A new plug-in called check_nrpe is also distributed with the NRPE agent and is used to query the NRPE daemon from the Nagios server. NRPE utilizes a rudimentary access control system to assure that only particular Nagios hosts will be allowed to contact the NRPE client. A configuration directive such as the following within NRPE’s configuration file will only allow communication with a particular host:
allows_hosts=10.197.173.100
It is possible to configure NRPE run nearly any command with any arguments, although one is warned against doing this in the documentation. By default, NRPE will only run specific commands and their arguments as specified in its own configuration file (located on the host itself). Meaning that the Nagios server can tell NRPE to execute only specific commands specified in the remote host’s /etc/nrpe.cfg file, but the server may not pass arbitrary commands or plug-in arguments for the agent to execute. Below is a sample NRPE configuration. The specific commands (plug-ins) and arguments must be specified here. The Nagios server can then request NRPE to execute one or more of these commands and return the results:
command[check_users]=/usr/local/nagios/libexec/check_users –w 5 –c 10 command[check_load]=/usr/local/nagios/libexec/check_load –w 15,10,5 –c 30,25,20 command[check_disk_root]=/usr/local/nagios/libexec/check_disk –w 20 –c 10 –p /dev/sda1 command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs –w 5 –c 10 –s Z command[check_total_procs]=/usr/local/nagios/libexec/check_procs –w 150 –c 200
By default NRPE utilizes SSL communications between itself and the Nagios server. The SSL parameters are generated at compile time and stored in the C header file called dh.h within the NRPE source tree. This header file is then used to compile the NRPE daemon and the check_nrpe plugin. This means that both the NRPE daemon and the check_nrpe plugin must be compiled using the same parameters (typically from the same source tree) if one wishes to utilize SSL communications.
III. Hyperic: Hyperic HQ is a Java-based monitoring application consisting of a central monitoring server and one or more remote agents to report node status information to the server. Hyperic HQ is supported on a wide array of platforms, including Linux, Solaris, Windows, HP-UX, AIX, Mac OS X and FreeBSD. Hyperic distributes two versions of its software;
HQ Open Source and HQ Enterprise Feature Set Comparison Note: As of HQ 3.0 thefeature-set distribution between the Open-Source and Enterprise versions has changed. Please see https://www.vmware.com/tryvmware/?p=hyperic&lp=1 for more details.
Hyperic Installation and Configuration: Hyperic HQ aims to be quick to install and relatively easy to configure. The installation is performed via the command-line, and will prompt the administrator for all the information (administrator password, database information, etc) it will need to successfully run. Upgrading can also be done relatively easily by simply running the installer with the –upgrade option. Hyperic HQ provides a web interface to deliver monitoring alerts and status information to the end-user. However, unlike other monitoring applications the web-interface is also used as the primary configuration interface for the application. All node and agent details, metric options and alerts may be configured directly over the web interface. The monitoring agent is installed in a similar manner as the server. Because all agent configuration is done via the web interface on the server, the only information the agent installation script needs is login information for the server, the preferred path on the node to which it should install the agent files and various other pieces of information such as the port numbers on which the server and agent will be running. Once the agent successfully registers itself with the server, the administrator can then log in to the web interface and import the new system into its list of monitored hosts. The Hyperic HQ server utilizes the open-source PostgreSQL database application to store configuration and monitoring data. PostgreSQL comes prepackaged with the Hyperic HQ software, and can be installed and configured automatically by the installation system. One may also choose to use an existing PostgreSQL or Oracle database server if one exists. The installation system would then prompt the administrator for information about the database so that Hyperic HQ may log in and store its data. By default, Hyperic HQ stores its authentication information within this database as well, but may also be configured to utilize and external LDAP server if one is available.
Auto-Discovery: A unique feature of the Hyperic HQ monitoring solution is its ability to automatically locate and monitor services and daemons running on the remote node. Once the agent is installed on the remote node it can then scan for a variety of known services and add it to the hosts inventory. Once added to the inventory, metrics and alerts can be configured to monitor that particular service. Hyperic HQ supports two scanning options, auto-scan and file-scan. Agents run an auto-scan periodically by default which scans the process list for known server types. A more comprehensive scan called a file-scan can actually search through the file system on the remote node and locate known applications. Because it requires more time to run and is more resource intensive, this type of scan must be scheduled and configured manually by the administrator.
Alerts and Notifications: Hyperic HQ supports the configuration of alerts based on any metric for any particular resource (such as the host itself) or service running on the host. For example, an alert can be triggered when the Availability metric for a host falls changes at all, or falls below a predefined value. When an alert is triggered an email can be sent to a predefined email address. Depending on the priority of the alert, a message will also be posted to the Dashboard, the Hyperic HQ administration front page. The HQ Open Source version lacks many of the more advanced notification options that are available in the Enterprise version. HQ Enterprise also supports the concept of Recovery Alerts, which are alerts that can be configured to cancel and reset triggered alerts. When an alert is triggered in the Open Source version, the alert will continue to be triggered until the problem is fixed or the alert is disabled. Recovery Alerts allow an administrator to automate the process of disabling an active alert, and then re-enabling the alert when the problem is corrected. HQ Enterprise also supports the option of sending SNMP traps as a notification option.
Hyperic HQ Plugins: Hyperic HQ plugins are distributed as .jar or .xml files that are deployed on the server and the agent. Plugins can be developed to enhance the collection of metrics from certain applications or services, locate and inventory new services and control actions to control specific resources. The Hyperic website provides comprehensive documentation on plugin development. Developing and adding a new plugin tends to be a more complex process compared to Nagios or other monitoring applications. The framework provided by Hyperic HQ, however, provides advanced APIs from which the plugins can query information on multiple platforms. On Windows, for example, Hyperic HQ includes classes which a plugin may use to access Windows specific data and functions. These functions can provide access to performance information, registry data, event log information and the Service Control Manager (SCM). Hyperic HQ also provides support for simple script-based plugins to gather particular metrics. Even individual scripts or Nagios plugins may be imported and configured for use by the Hyperic HQ server and agents.
SIGAR – System Information Gatherer And Reporter: SIGAR is the primary data collection component of the Hyperic HQ agent. The software is designed to collect system and process information from a number of platforms - including Linux, Windows, Solaris, AIX, HP-UX, FreeBSD and Mac OSX. SIGAR is written in C, but Hyperic provides C, C#, Java and Perl APIs which one may use to to integrate SIGAR into their applications. The SIGAR component is licensed under the GNU GPL, and is distributed separately from the Hyperic monitoring agent for potential use in third-party applications. The Sigar API provides a portable interface for gathering system information such as:
user@linux:~/hyperic-sigar-1.3.0.0> java -jar sigar-bin/lib/sigar.jar Loaded rc file: /home/user/hyperic-sigar-1.3.0.0/sigar-bin/lib/.sigar_shellrc sigar> help Available commands: alias - Create alias command cpuinfo - Display cpu information df - Report filesystem disk space usage du - Display usage for a directory recursively free - Display information about free and used memory get - Get system properties help - Gives help on shell commands ifconfig - Network interface information iostat - Report filesystem disk i/o kill - Send signal to a process mps - Show multi process status netinfo - Display network info netstat - Display network connections pargs - Show process command line arguments penv - Show process environment pfile - Display process file info pinfo - Display all process info pmodules - Display process module info ps - Show process status ptql - Run process table query quit - Terminate the shell route - Kernel IP routing table set - Set system properties sleep - Delay execution for the a number of seconds source - Read a file, executing the contents sysinfo - Display system information test - Run sigar tests time - Time command ulimit - Display system resource limits uptime - Display how long the system has been running version - Display sigar and system version info who - Show who is logged on sigar> Example SIGAR usage from the command-line.
user@linux:~/hyperic-sigar-1.3.0.0> java -jar sigar-bin/lib/sigar.jar
Loaded rc file: /home/user/hyperic-sigar-1.3.0.0/sigar-bin/lib/.sigar_shellrc
sigar> help
Available commands:
alias - Create alias command
cpuinfo - Display cpu information
df - Report filesystem disk space usage
du - Display usage for a directory recursively
free - Display information about free and used memory
get - Get system properties
help - Gives help on shell commands
ifconfig - Network interface information
iostat - Report filesystem disk i/o
kill - Send signal to a process
mps - Show multi process status
netinfo - Display network info
netstat - Display network connections
pargs - Show process command line arguments
penv - Show process environment
pfile - Display process file info
pinfo - Display all process info
pmodules - Display process module info
ps - Show process status
ptql - Run process table query
quit - Terminate the shell
route - Kernel IP routing table
set - Set system properties
sleep - Delay execution for the a number of seconds
source - Read a file, executing the contents
sysinfo - Display system information
test - Run sigar tests
time - Time command
ulimit - Display system resource limits
uptime - Display how long the system has been running
version - Display sigar and system version info
who - Show who is logged on
sigar>
Example SIGAR usage from the command-line.
And that does it for the “Monitoring” section. There are so many other tools we got a chance to play with like Monit, Argus, OProfile etc. but am running out of space …… As always, please let us know if you found the above mentioned useful and any comments/feedback you may have. Thank you for tuning into Port25.
by Garrett Serack on June 21, 2007 06:50pm
I'm pleased to announce ... er, myself, as the Open Source Community Lead here at Microsoft.
I'd have left this to Sam, but hey--why should he get all the fun.
I'm responsible for building communities of Open Source developers around Microsoft's platforms, both externally, and internally--yes, this means the product groups. I'm really interested in what kinds of things we can start building as Open Source software, and illuminating what we've already done.
I said a few things the other day on My blog that I think I bears repeating:
This is a pretty wide reaching role, meaning that I touch a lot of ground. Some of the highlights:
There have been a lot of changes in Microsoft in the last few years, that folks can't yet see, and I'm hoping to expose that type of thing to the world, and bring the world of Open Source to Microsoft.
I'm not going to espouse the great plans I have in too much detail... I've found that actions speak louder than words, and have far more lasting impact than the words do. I'm focusing on what Microsoft is doing, and less on what has been said. I mentioned that too in my blog:
I don't get it... Microsoft and Open Source? Are you sure?
I know... I know. Y'all got some reservations about Microsoft with regards to open source. Well, I'm not going to try convince you of anything. What I am going to do is to shine the light on the things Microsoft is doing to create communities in the Open Source world. Add to that, I'm doin' some rustlin' inside of the company itself--as expected, there are a few tenderfoots 'round here who would just soon reckon' we didn't bother. Well, I got a cattle brand heatin' up just for the conversation.... We'll just see about that.
I know... I know. Y'all got some reservations about Microsoft with regards to open source. Well, I'm not going to try convince you of anything. What I am going to do is to shine the light on the things Microsoft is doing to create communities in the Open Source world.
Add to that, I'm doin' some rustlin' inside of the company itself--as expected, there are a few tenderfoots 'round here who would just soon reckon' we didn't bother. Well, I got a cattle brand heatin' up just for the conversation.... We'll just see about that.
Somethin' about me:
I joined Microsoft in the fall of 2005 as the Community Program Manager of the CardSpace team, and I've been working with companies and the open source community to build digital identity frameworks, tools and standards to shape the future of internet commerce and. I'm also co-writing a book titled Understanding CardSpace, which should be available in the fall of 2007. Prior to moving to the Puget Sound area, I've had a lengthy career as a Software Development Consultant, moving from Developer, to Architect, to Mentor over the course of the last 16 years. As life-long code-monkey, I've pounded out code on more than 20 platforms and 35 different languages, and I see no reason to stop there. I've put code into many open source projects, and I'd like to think that I share a very strong part of the Open Source vision that permeates information technology everywhere. You can catch all my posts on my blog at http://fearthecowboy.com .
What's Next:
In my next blog post I'll detail the promise--that is my commitment to the community. I think it's important to know what you can expect, as well as my boundaries. I'll also have communication channels setup so that you can talk to me; either publicly, or via confidential email.
Garrett Serack
[PostIcon:4108]
by kishi on June 29, 2007 03:44pm
Level-Set - Patch Management: Patch Management and Maintenance focuses on those solutions available to deploy and install software update on Linux systems, with a primary focus on Novell based Linux systems. This is going to be a very short blog because the only open source tool that I could find, which is used in a widespread manner, is YaST. I know there are tons of solutions out there, some proprietary like RHN and some custom built. YaST was the only common thread we could recognize. A deeper look at YaST and its online update abilities follows:
YAST Online Update Utility
Probably the most common and important modules in YaST are those related to software management (adding and removing software) and patch management. Software and updates for a typical SUSE system are obtained from software repositories, which can be local or remote software inventories from which new software or updates may be obtained. At a deeper level, the SLES9 package management system utilizes the common rpm utility to install, remove, and update packages and manage the package and dependency database. Although this subsystem is similar to RedHat’s, Novell has chosen a very different approach to distributing its patches, choosing to utilize what are called patch RPMs. With many RPM-based distributions, when a package needs to be updated for one reason or another the distributor will modify or patch the original source tree and recompile/repackage the software to produce a new RPM for that particular package. Therefore in these cases the new RPM will simply be an updated version of the original RPM.
Novell has taken a slightly different approach with patching via RPMs. Instead of updating and repacking the entire package, Novell updates the original source tree, recompiles, and then produces a delta (or a diff) between the original binaries in the package and the newly patched/recompiled binaries. The delta is a binary file that contains information about the differences between two binary files. The deltas will then be packaged within an RPM and distributed to clients. The patch RPM can then be manually or automatically installed in the same way a standard RPM would be installed. An advantage to this technique is that patches are often smaller in size – typically anywhere between 5KB and 8MB depending on the size of the package and the changes being applied. This often allows the update process to progress far faster than it would otherwise when using full RPMs – especially for large applications.
Major updates to the stable SLES9 branch are released as an installable “service pack”. Novell typically recommends installing the service pack files via YaST2, from either a CDROM or network location that contains the service pack files. One may also simply utilize the Online Update module of YaST2 to update the system manually or automatically. In this case, the service pack will be distributed as a large number of individual packages, similar to how RedHat distributes major updates (i.e. RHEL4 U4). Aside from a log file, SLES9 does not currently have an email mechanism to inform the administrator when a patch is automatically downloaded and installed (as RedHat does). However, a log file that contains information about each automatic update is maintained in /var/lib/YaST2/you/youlog. This log is generally very easy for an administrator to read and discover when, or if, a patch RPM was downloaded and installed.
There are other ways to find information about installed patches, however. By default, SLES9 archives each patch RPM that is downloaded and installed. Full RPMs will also be archived if they were installed via YaST2 after the original system installation. This functionality can be disabled with YaST2, of course, although it can sometimes be useful to maintain the archive if a patch ever needs to be reinstalled.
1. YaST Software/Update Repositories
Software repositories are typically added manually via the Installation Source module in YaST or can be scanned using SLP (Service Location Protocol). From this module, one may add references to locations from which to receive updates. These references typically take the form of a URI or a directory path. YaST supports the following software repository references:
Using this methodology it is also quite common for an administrator to install a centralized repository for software and updates. Updates may then be obtained from Novell by a single server, and other servers on the LAN may then pull patches from the central patch server using one of the above protocols.
2. YaST Security
Although software repositories for SLES and SLED distributions are typically operated by Novell, it is quite possible to add third-party repositories to obtain software not offered by Novell, or even different versions of the same software packages. Novell warns against this, however, since adding repositories not controlled by Novell can result in the installation of untested or possibly malicious software, which ultimately could compromise security, but more likely may result in software instability and RPM package conflicts.
All official software and patches obtained by Novell are cryptographically signed, which can be verified with Novell’s public key. The public keys used to verify these signatures are typically obtained via the official SLES/SLED CDs or DVDs, but may also be obtained via Novell’s website. Once these public keys are accepted and imported, any software package or update obtained with an invalid signature will produce a warning and may not install without user intervention. 3. YaST Automatic Updates Automatic updates can be configured via YaST’s Online Update Setup module, which allows a user to schedule updates to occur at a particular time either daily or weekly. On the backend, this module simply installs a new cron entry, a task scheduling application, which periodically runs another program to check for and install updates pushed out by Novell. In earlier SUSE-based systems, YOU (YaST Online Update) had been used to automate the installation of updates packages. The cron utility would execute a shell script called /usr/bin/online_update which would automate the patch installation process. Newer versions of SUSE, including SLED10, utilize a similar process but instead of a shell script a utility called rug is used. The rug utility is the command-line interface to the ZENworks management agent that is present on new SUSE systems.
All official software and patches obtained by Novell are cryptographically signed, which can be verified with Novell’s public key. The public keys used to verify these signatures are typically obtained via the official SLES/SLED CDs or DVDs, but may also be obtained via Novell’s website. Once these public keys are accepted and imported, any software package or update obtained with an invalid signature will produce a warning and may not install without user intervention.
3. YaST Automatic Updates
Automatic updates can be configured via YaST’s Online Update Setup module, which allows a user to schedule updates to occur at a particular time either daily or weekly. On the backend, this module simply installs a new cron entry, a task scheduling application, which periodically runs another program to check for and install updates pushed out by Novell.
In earlier SUSE-based systems, YOU (YaST Online Update) had been used to automate the installation of updates packages. The cron utility would execute a shell script called /usr/bin/online_update which would automate the patch installation process. Newer versions of SUSE, including SLED10, utilize a similar process but instead of a shell script a utility called rug is used. The rug utility is the command-line interface to the ZENworks management agent that is present on new SUSE systems.
If you are running any open source based tools or applications in your environment to push patches and manage online update scenarios, we would REALLY like to hear what you have to say. As always THANK YOU for tuning into Port25
by jcannon on July 05, 2007 03:37pm
We're nineteen days away from OSCON, and very excited about participating at this year's event. Microsoft is a Diamond sponsor of OSCON, and we have a number of interesting open source and Linux interoperability sessions and keynotes planned throughout the show. For those who can attend, we'll hope you join us for some of the highlights below:
If you can't attend in-person, stay tuned to Port 25 for coverage of OSCON, the sessions above - and more... Jamie.
by billhilf on July 26, 2007 09:15am
Today, Microsoft took another step in its relationship with the open source software community. We did this by bringing up a new web property that clearly outlines Microsoft’s position on OSS by providing specific information about Microsoft, the OSS community and the interaction between the two. The new site also details information about getting started with OSS and Microsoft technologies. We'll keep the site updated with new content featuring Microsoft’s engagements with the OSS community - be that events like OSCON, partnerships, offers or just interesting articles highlighting different work we're doing across the company. Port 25 will continue to be the source for technical analysis and community with the Open Source Software Lab.
Visit the site, read the articles, send feedback. Thanks for participating.
by jonrosenberg on July 26, 2007 12:00pm
This is my first blog post on Port 25, and timely as my team and I are attending OSCON with the folks from Bill Hilf’s team.
I have some thoughts regarding the future of open source and how an organization matures along with the movement it helped to create. As Director of Source Programs at Microsoft I can attest to the value of keeping up with your own growth. We started on a journey, over three years ago, with the release of Windows Installer XML on SourceForge. At the time, the project required the approval of our Group Vice President and a herd of lawyers. The reactions of our colleagues were mixed, although as far as we know, none of our kids were beaten up at school as a result of what we were doing. Today, Microsoft has published 175 projects on CodePlex, we have written a pair of open licenses that are under a page in length and over the 500-project mark in adoption as others in the community have decided to use them. I also run a training class that teaches people around the company how to engage in open source projects and make them successful. The volume of projects over the past year has forced us to develop processes for approving and publishing projects that are easy to understand and administer.
As Microsoft’s engagement with open source grows, we have to move from being trailblazers to being road-builders. When you’re blazing a trail, organization, bureaucracy, and majority rule are a burden. In the beginning, a passionate group of people with strongly held beliefs and the will to persevere in the face of doubts and doubters is what it’s all about. When the trail is blazed and you’re keeping a four-lane road open, the challenges are very different. Traffic laws, driver’s licenses, public works, and law enforcement are all necessary and these things require the broad support of the people who use the road and live on the adjacent property. There’s nothing quite as effective in gaining this support as giving people a voice in how things are run. As we look forward to the next three years, we already see the needs of our constituents driving our priorities for licensing, infrastructure, and process. Although open source at Microsoft and the OSI are two different animals, I would submit to you that both are at a point in their maturity where their constituencies need to become more involved to maintain growth. While it’s important to focus on the needs of a growing community membership, it’s also important to remember why you started it in the first place. In Microsoft’s case, the reason is simple: Customers. IT professionals told us they wanted both platform choices and platform interoperability. Developers told us that they wanted more open collaboration and that the language of that collaboration is code. In response, Microsoft has reached interoperability agreements with several key vendors of open source software, CodePlex is now supporting 2,000 collaborative development projects, and the features of CodePlex itself are largely driven by the votes of the community.
Today, we reached another milestone with the decision to submit our open licenses to the OSI approval process, which, if the licenses are approved, should give the community additional confidence that the code we’re sharing is truly Open Source. I believe that the same voices that have been calling for Microsoft products to better interoperate with open source products would voice their approval should the Open Source Initiative itself open up to more of the IT industry. So what about the flip side of the OSI becoming a membership organization? Could they really be voted out of existence or rendered ineffective? It doesn’t seem likely to me. Participation in the OSI and adherence to OSI licensing guidelines and Open Source definitions is entirely voluntary. If it isn’t serving the best interests of the community, the community will go elsewhere. Anyone considering an effort to “vote the organization into the ground” would surely realize that such heavy handedness would be self-defeating. That’s not to say that a new membership structure wouldn’t lead to change, but I believe that these changes would have to be the result of vigorous consensus building and that’s probably not a bad thing.
I look forward to the submission process and welcome feedback from the community as we continue to grow together.
by Paula Bach on August 01, 2007 03:35pm
In my last blog I talked about interdisciplinarity and multidisciplinarity and a little bit about my research this summer. In part 1 and 2 of this blog I am going to talk more about the research I have been doing here at Microsoft. Over the last few months I have been looking at a phenomenon called usability expertise. Anybody who has had difficulty using a product has some experience with usability expertise. Usability expertise is knowledge about how to design an artifact to ensure users experience product effectiveness, efficiency, and satisfaction in a specified context of use. Even if people are not experts in Human Computer Interaction (HCI), they can experience a lack of usability expertise in the design of the product.
HCI experts are actually quite rare because the field is young and underdeveloped. The field of HCI is newer than computer science. HCI grew out of computer science about fifteen years after the software engineering crisis in the sixties and although Human Factors is about fifty years old, it has not necessarily been linked to software engineering like HCI has. Software development has included a user interface role to design and develop the human-computer interface, and although some companies still employ user interface developers, HCI experts include UI designers, Usability Engineers, User Experience Researchers and Designers, and Interaction Designers—roles that go beyond the interface and include field research, visual design, and lab studies, for example.
Although the obvious place to look for usability expertise is in the knowledge of HCI experts, I am interested in what role this expertise plays in software development. Just having HCI experts available is not enough to ensure good usability. I want to know who has usability expertise, how it is communicated among project members, and how it is used to make decisions. To find these things out the research looks at both proprietary and open source software development settings. What I am reporting here is an overview, or summary, of preliminary findings. I am still analyzing the data and will publish “official results” in the next year and a half while I work on and finish my dissertation. The research seeks to understand the role of usability expertise in software development and takes that understanding to inform the design of a feature or tool on CodePlex that will support usability expertise for projects interested in making sure their software is usable by their intended user base.
Usability expertise in the context of design is related to design rationale, or more specifically usability design rationale. Design rationale is the "the capture, representation, and use of reasons, justification, notation, methods, documents, and explanations involved in the design of an artifact" (from the book Design Rationale by Moran and Carroll). Since design rationale is a well defined concept that has many details, its presence in real design discussions may be fragmented. This fragmentation might be better understood as usability expertise. So a rough definition of usability expertise might be the “stuff” needed to talk about and make decisions about usability during software development. The “stuff” could be the elements in design rationale or something people have not talked before. In this sense my discoveries made while investigating the role of usability expertise could be groundbreaking or they could be well known in the software development communities. Either way reporting the findings of the role of usability expertise should be interesting. In fact, several people, both at Microsoft and the open source communities I surveyed have already stated that they would like to see the findings, so this is encouraging.
I am collecting data in a number of ways: surveys, interviews, and observations. I surveyed people at Microsoft who are part of the software development process of a project, namely usability experience researchers and designers, developers, and program managers. In the open source world I posted the survey to major projects who met criteria for overtly caring about usability, namely that they had a usability list and at least one person listed as a usability expert. The Microsoft usability expertise survey is still collecting responses, and although I am still working with the data on the open source survey, I can mention a few things.
In the open source survey, fatigue affected about half of the 125 respondents with 56 making it to the last question. The survey had two open ended questions asking about the importance and challenges of usability in open source. Usually open ended questions are best saved for the end after other more important questions are answered. The tradeoff was that the open ended questions were important and that the survey could have biased the open ended responses if they were at the end because the survey included questions that asked about specifics with the importance of and challenges with usability.
Data clustered around categories of ease of use, simplicity, and consistency for usability importance, with each category claiming about a quarter of the responses. About 10% of the respondents stated that issues related to system performance were important for usability. Usability challenges included about a quarter of the respondents reporting that challenges with usability in open source software development were developer based. This included not valuing usability, not having usability expertise other than self-referential (based on own experience), and communication problems related to common ground. Common ground is when two people reach a mutual understanding such that one person knows that the other person knows that the first person knows. Common ground is more difficult to reach in computer-mediated environments than in face-to-face environments because not as many channels exist to help with understanding—in face-to-face you can use people’s expressions and gestures to help you understand what they are saying. Other categories included lack of resources and lack of process (both at about 10% of the responses). Other questions I am asking the data include the following:
1. Who has usability expertise? 2. How is usability expertise communicated? 3. How is usability expertise used to make decisions? 4. Who cares about usability expertise? 5. How available is usability expertise?
The data may not be able to answer the above questions in full, but it will get me closer to asking different questions that may be more relevant to the data. I am conducting interviews which may also be able to address the questions and get at depth surveys cannot.
I have been scheduling and conducting interviews with Microsoft people and will report on those preliminary findings in the next blog. I will conduct the open source interviews via video conference when I get back to Penn State. The open source usability people I am going to talk to are all over the world: US, Canada, Germany, Australia, and France.
I have also been observing three open source projects looking at email lists and other interesting things like conversations in the bug tracker, how a usability issue is handled in the bug tracker, and reading UI specifications. I chose three ‘big’ open source projects that attend to usability. I wanted diversity in the projects and a wide user base. I spent 8 weeks observing the workings of usability in Firefox, KDE, and OpenOffice.org. The discussions on the email lists vary considerably. Some are short and polite with a developer inquiring about the usability of a particular design change or feature he is thinking about. Others are heated and get users, developers and usability people involved trying to hash out the merits of a feature.
The most often used design rationale, or type of usability expertise, is self-referential. The people on the lists, and in the beginning mostly users or user/developers respond to the feature proposal, speculate about the usability of the change based on their own experience. Since most of the users on the lists are advanced or power users, this might not be representative of the main user base, at least for the three projects I was studying. I don’t know if they have any data about the user base, but it may be that the email lists are only one input to the decision making about usability of those projects. Despite the openness of the discussion list and other aspects of the development, there are other decisions that are made ‘behind the scenes’. Possibly, the ‘behind the scenes’ usability expertise that contributes to decision making about which usability fixes to include in the next release is similar to how proprietary usability expertise is used in decision making. This is something I will consider when investigating the role of usability expertise in both environments.
by Sam Ramji on August 03, 2007 02:51pm
Back in the Spring, Sam Ramji attended an Olliance event entitled, the Open Source Think Tank. It's a smaller gathering, and well attended by nearly 100 or so executives and influential developer-users of open source software. During one of the sessions, Sam and Justin Steinman took an impromptu moment to answer some tough questions regarding the nature of the Microsoft-Novell partnership. Justin Steinman is Novell's Director of Marketing for Linux and Open Platforms.
Many of the questions had been asked before and in fact have been posited more than once on Port 25. We thought these discussions would be interesting to the community at large - ...so Sam & Justin hopped on the phone recently to answer them in podcast format. Take a listen....I try to emcee - but these are tough guys to keep on one topic :) As always, we welcome feedback and we'll invite Sam & Justin to answer the comments. If you want more information on the Novell partnership - you may want to check out moreinterop.com - home to most announcements, events and information related to the partnership.
by kishi on August 07, 2007 01:57pm
Level-Set – Log Management: This section includes open-source technology directed primarily on host-based logging, log file rotation and log file analysis. Many of these tools are very common free and open-source software tools that are distributed and preconfigured with most of the major Linux systems, including major vendors such as RedHat and Novell.
I. Logrotate
Logrotate is a very popular application utilized in a number of Linux systems, including all RedHat and SUSE based systems. The logrotate utility typically runs periodically via cron, a task scheduling application. The utility will read a configuration file (/etc/logrotate.conf), and archive and compress log files according to the configuration. Administrators can configure when log files should be rotated based on age and size, and how long backlogs should be maintained. Older archived log files can then be swapped out and replaced with newer archives.
II. Syslogd and klogd
Typical Linux systems utilize a syslog daemon to capture log messages from userspace applications and write them to text-based log files or send them to a logging host over the network. The syslogd daemon is often accompanied by a klogd application which is designed to capture and log kernel messages.
The behavior of the syslog daemon can be configured via the /etc/syslog.conf configuration file. All messages captured by syslog are categorized by facility and priority. Messages can then be sent to particular log files or logging hosts, or dropped completely based on their facility and priority attributes.
- authpriv
- cron
- daemon
- kern
- lpr
- mail
- mark
- news
- syslog
- user
- uucp
- local0 through local7
- info
- notice
- warning or warn
- err or error
- error
- crit
- alert
- emerg or panic
List of syslog facilities and priorities.
III. Syslog-ng
The syslog-ng application aims to be an enhanced drop-in replacement for the traditional syslog daemon. It provides many of the same features of the standard syslog daemon, but includes additional features such as advanced message filtering based on content, remote logging via UDP or TCP, and the ability to write log files to a database such as MySQL or PostgreSQL. More recent SUSE-based systems such as SLES10 have switched to syslog-ng as the default syslog server.
IV. Viewing Logs
Most log files on a Linux system are stored in plain-text, which means they can be viewed and parsed using a number of different command-line tools. Typical utilities such as tail, head, grep, cat, less, more, sed and awk can be used to view and filter log messages via the command line.
There are also a myriad of utilities designed to parse and view log files via a GUI or web browser. Some utilities are even designed to handle specific log formats, such as those generated by Linux’s Netfilter firewall subsystem.
GNOME System Log Viewer
The GNOME system includes a GTK-based system log viewing application that displays system logs via the GUI.
YaST System Log Module
SUSE-based systems using YaST typically include a module called View System Log (called internally as view_anymsg). Similar to the GNOME System Log viewer, the YaST module allows an administrator to view many of the various system logs without using the command-line.
V. Log Analysis
LogWatch
The logwatch utility is designed to parse system logs and located any entries that might indicate security threat or system failure and send an email report to a designated address. Logwatch is distributed with RedHat Enterprise Linux systems. The following is an excerpt from the RPM description:
“LogWatch is a customizable log analysis system. LogWatch parses through your system's logs for a given period of time and creates a report analyzing areas that you specify, in as much detail as you require. LogWatch is easy to use and claims that it will work right out of the package on almost all systems. Note that LogWatch now analyzes Samba logs.”
LogWatch is typically executed periodically via cron, a task scheduling application.
LogCheck
The logcheck utility is a part of the Sentry Tools project that also includes portsentry, a utility designed to detect port scans. Similar to the LogWatch utility, the software is designed to parse system log files, find log entries that may indicate security problems and send an email to a preconfigured address. Also similar to the LogWatch utility, logcheck relies on the standard cron utility to be periodically executed.
That does it for Log Management and Analysis section. We have one last blog to go and certainly hope that you found the information we have captured for you useful. If you’re running any special toolsets or customizable scripts for log management and analysis and would like to share your experience with us, please send us your feedback and as always, THANK YOU for tuning into Port25.