CBASS Troubleshooting Guide


This document is intended to help beamline staff who are actively troubleshooting a problem with CBASS. We'll look for easy fixes first, then dig deeper if necessary. This document assumes that you have tried to restart CBASS in order to fix the problem, and that has failed.


CBASS controls a diffractometer, detector, beamline_motors, an optional crystal mounting robot, and it interacts with the PXDB database.


AbiWord Image cb_trouble_arch-50.png



Most problems with CBASS can be traced back to either detector or diffractometer control. In less frequent circumstances there is a networking glitch or a problem with beamline motor control (EPICS).



Specific situations:


"hanging on dark" or "hanging on readout":


This is typical of a detector communications problem. On beamlines with ADSC detectors, CBASS communicates with adsc processes (e.g. det_api_workstation, ccd_image_gather, ccd_xform) that typically run on the same machine that CBASS runs on . These processes then communicate with a program that runs on the detector control PC's that are physically connected to the detector. In most cases, a detector control failure is caused by a problem with a detector control PC or the program running on it.


ADSC detector controlled by multiple PCs (315s, 210s, upgraded Q4s)


1) Exit CBASS

2) Using the monitor switch on the ADSC rack, cycle through the screens for each detector control computer and see if one of them looks different than the others. You may find that one has crashed. If so, then reset that computer and start the "remote detector operations" program

3) Restart CBASS


If a problem with a particular PC isn't obvious, cycle through the screens again and kill/restart the "remote detector operations" programs. Then restart CBASS.


ADSC detector controlled by a single PC (probably an older Q4)


1) Exit CBASS

2) Kill and restart the control program on the ADSC PC

3) Restart CBASS

If this doesn't correct the problem, then


1) Exit CBASS

2) Kill the control program on the ADSC PC

3) Reboot ADSC PC

4) Restart ADSC control process on ADSC PC

5) Restart CBASS



---------------------------------------------------------------------------------


Loss of diffractometer control:


This is often seen as a failure of CBASS to start or to open the shutter or rotate an axis. CBASS will sometimes identify the problem at startup as "Diffractometer Communications Error".


The first attempt of fixing this can be to exit CBASS, cycle the power on the diffractometer, then restart CBASS. If that doesn't work, tail the logfile ~/compumotor_outfile.txt. (tail -f ~/compumotor_outfile.txt) Look for "timeout" errors. If they're present, confirm that the serial port connections on the control computer and the diffractometer are secure. The serial port used for diffractometer control is specified in $CONFIGDIR/cbass_env.txt as $GON_PORT_NAME. It is something like "/dev/ttyS1". If you see write errors in the logfile, do an "ls -l" on the serial port and make sure you have write permissions. Write permissions shouldn't disappear, but have done so after reboots after operating system upgrades. Here's an example of "ls -l /dev/ttyS1" output:


crwxrwxrwx    1 root     uucp       4,  65 Mar 24  2004 /dev/ttyS1


It's important to see the last "w", which means anyone can write there. As root, "chmod 777 /dev/ttyS1" would restore write permissions.


Another rare problem has ocurred when an "h machine" reboot causes a change in the ordering of the USB ports. This is only a potential problem when we use a USB port with a serial converter to communicate with the diffractometer. In those cases, $GON_PORT_NAME will have "USB" in its name (e.g."/dev/ttyUSB1"). To work around this, you'll have to guess what the new serial port name is for CBASS, and set $GON_PORT_NAME to that name. For example, you might have to change

"setenv $GON_PORT_NAME /dev/ttyUSB0" to

"setenv $GON_PORT_NAME /dev/ttyUSB1

if the system changed its name assignments to the ports.

If other programs use a USB serial port, such as the X29 gap control, those programs will also need to have the port numbers changed. An alternative to changing port names would be to swap cable positions while noting carefully each change.

Again, this is a rare and somewhat complicated problem.


------------------------------------------------------------------

"EPICS motor Initialization Error"


The symptom is loss of beamline motor control. On an attempted restart of CBASS, you may see "Epics motor Init Error". The first thing to do is to confirm that EPICS is running and is serving the motors. The best way to do this is to probe a channel. This is done with a medm tool called "probe". On the "h" machine, type "probe:<beamline_id>:<real motor code>.  For a list of motors controlled by CBASS on each beamline, and the beamline id, you can look at $CONFIGDIR/epx.db. You can try "probing" any real motor in that file. For example, on x12c, "probe x12c:mon" would pop up a medm screen that should have the current position of the monochromator (probably in degrees). The x29 monochromator is not controlled by EPICS, so you could do "probe x29a:tzo" to test reading a table motor. If the medm screen has no value, this would indicate a failure to communicate with EPICS. The best thing to do would be to reset the crate, give it some time, then reattempt the probe. If you get a reading, then you've probably fixed the problem, and you can restart CBASS. If there's still no reading, this is a more serious problem that is most likely network related.

------------------------------------------------------------

CBASS Hung after moving an Epics motor


There are instances in which EPICS thinks a motor is moving when there isn't. This will hang CBASS. CBASS looks at an EPICS process variable named "[beamline_name]:alldone". This is an "anding" of all of the "dmov" (done moving) fields of all of the epics motors. If EPICS thinks a motor is moving, "alldone" will = 0, and CBASS will hang waiting for it to go to 1. The typical scenario is that you do something like a realign or a mono scan but the cbass prompt never returns. You can determine which motor is moving by opening up medm screens (motor8, motor8_2,...) until you find the motor that's stuck in the moving state. You can usually free it up with a small tweak. On some beamlines, there is a utility that you can execute on the unix command line on the "h" machine to diagnose this. Type "check_motors", and it will tell you the state of alldone, and the name of any moving motors.


"probe" is another useful facility. It comes with the EPICS distributions and allows you to look at any process variable. For example, on X26c, typing "probe x26c:alldone" on x26c-h would tell you if EPICS thinks there are any moving motors.

-----------------------------------------------------------

Database Access Trouble


CBASS depends on a connection to PXDB, which resides on navajo.bio in building 421. Should this machine become unavailable for any reason, CBASS will not be able to collect data on any of the beamlines. This has only happened once. PXDB can be bypassed easily by editing $CONFIGDIR/cbass_env.txt and setting "HAS_DB" to 0, then restarting CBASS. This will allow operations to proceed while database access is addressed offline.


-------------------------------------------------


Loss of "/h" filesystem (midgard)


The machines that run CBASS at the beamlines ("-h" computers) mount the "/h" filesystem from midgard in order to maintain the html logs in a persistent and centralized area. If the "/h" filesystem becomes unavailable, the software may hang. This is easy to work around by editing $CONFIGDIR/cbass_env.txt and setting "HTML_LOG_ROOT" to somewhere other than "/h". For example, if you were taking data in /img09/data1/pxuser/skinner, you could change HTML_LOG_ROOT to

"/img09/data1/pxuser". When you restart CBASS, the html log root will then be in that directory.


-------------------------------------------------------


General Troubleshooting


CBASS maintains log files that are helpful in troubleshooting. On beamlines with Crystallogic diffractometers (all beamlines except x26c), "compumotor_outfile.txt" can be found in the pxuser home directory on the "h machine". If you're having trouble related to diffractometer control, it's worth "tailing" it in a separate window while trying to start CBASS. Special attention should be given to "Timeout detected" messages. CBASS will not have control of the diffractometer if these messages are appearing. These are usually related to a failure of the Crystallogic program running on the compumotor, or a serial cable being disconnected.


In the directory in which you are running cbass, you can find "cbass_outfile.txt". This is the output of the cbass server. The things to look for here are crashes that produce a lot of output from Python. These stack dumps provide some idea of what is failing.


CBASS is fairly easy to troubleshoot if you understand what it does. When someone types "cbass" at a beamline, they execute a script. To find the script, type "alias cbass". This will tell you what is being called. In many cases, the result is "cbass2k4". Typing "which cbass2k4" will tell you where the script is. You can then use an editor to look at what CBASS does. Here's the CBASS script that runs on X12C:


#!/bin/csh -f                                                                  

source $CONFIGDIR/cbass_env.txt

killall -KILL tkmessage

killall -KILL gui.py

echo "kill cbass_server"

killall -KILL cbass_server.py

echo "kill gon_server"

killall -KILL compumot_serv.py

echo "kill detector_server"

killall -KILL q315_server

killall -KILL rsh

if ($HAS_BEAMLINE == 1) then

  killall -KILL cb_mon_mots.py

  killall -KILL cb_plot_monitor.py

  killproc xmgrace

  killproc ss_arch

  killall -KILL gv

  killall -KILL gs

  killall -KILL lynx

  killall -KILL chooch

endif

jpeg_engine $JPEG_ENGINE_PORT >> & /dev/null&

sleep 1

if ($HAS_BEAMLINE == 1) then

  $CBHOME/cb_mon_mots.py>>/dev/null&

  ss_arch -icon ~epics/expmnt/ioc/save_sets/x12c_saveall.cfg>>/dev/null&

  $CBHOME/cb_plot_monitor.py>>/dev/null&

endif

rsh -n $GON_HOST "$CONFIGDIR/startgon" &

sleep 4

if ($DETECTOR_OFFLINE == 0) then

  echo "Start ADSC Detector Servers."

  setenv Q4HOME /home/pxsys/q315

  source $Q4HOME/LOGIN_files/log_315

  /usr/local/bin/start_315_procs

  sleep 1

endif

sleep 10                                                    

$CBHOME/cbass_server.py>> & cbass_outfile.txt&

sleep 10

$CBHOME/gui.py


Most of the lines are associated with setup and environment. Note the lines in bold. These are the lines that start major components. In order, these are: diffractometer, detector, cbass server, and cbass gui. By carefully cutting and pasting (in order) from that script to the command-line on the "h" machine, you can usually identify what is causing the problem.


X29 USB/Serial Port Procedures


If you need to disconnect/reconnect the large USB/Serial Port box in the X29 rack, or cycle its power, follow the instructions below.


1. make sure the power is on to the IO Box

2. make sure that the white USB link is plugged in to the IO Box and to x29-h

3. "su -" to become root

4. run "./fixUSBports" and wait

5. logout of root and restart cbass.



X29 Specific - Labview Monochromator Server


You may see a message on CBASS startup that says:

"ERROR connecting to CBASS server!" This can be caused by a number of things, but if nothing specific is identified, it's worth trying the following


1) Stop macro on the labview PC

2) Type: "start_lv_server" in a window on x29-h and wait for it to say

it's ready.

3) Restart macro on labview PC

4) Restart CBASS