CBASS manual--troubleshooting

CBASS Troubleshooting Guide

This document is intended to help beamline staff who are actively troubleshooting a problem with CBASS. We'll look for easy fixes first, then dig deeper if necessary. This document assumes that you have tried to restart CBASS in order to fix the problem, and that has failed.

CBASS controls a diffractometer, detector, beamline_motors, an optional crystal mounting robot, and it interacts with the PXDB database.


Most problems with CBASS can be traced back to either detector or diffractometer control. In less frequent circumstances there is a networking glitch or a problem with beamline motor control (EPICS). It is strongly advised to call a px-operator to assist in solving the issue.

Specific situations:

"hanging on dark" or "hanging on readout":

This is typical of a detector communications problem. On beamlines with ADSC detectors, CBASS communicates with adsc processes (e.g. det_api_workstation, ccd_image_gather, ccd_xform) that typically run on the same machine that CBASS runs on . These processes then communicate with a program that runs on the detector control PC's that are physically connected to the detector. In most cases, a detector control failure is caused by a problem with a detector control PC or the program running on it.

ADSC detector hangs:

Attempt #1

1) Exit CBASS

2) Type "kill_detector" in an "h-machine" window

3) Restart CBASS

Attempt #2

1) Exit CBASS

2) Type "kill_detector" in an "h-machine" window

3) Using the monitor switch on the ADSC rack, cycle through the screens for each detector control computer and see if one of them looks different than the others. You may find that one has crashed. If so, then kill and restart the "remote detector operations" program for the one that looks different.

4) Restart CBASS

Attempt #3

If a problem with a particular PC isn't obvious in Attempt #2, execute "kill_detector", then cycle through the screens again and kill/restart all of the "remote detector operations" programs. Then restart CBASS. 

Attempt #4

This should be pretty rare. Same as "Attempt #3", but reboot all of the detector PCs before restarting the remote ops programs. DO NOT POWERCYCLE THESE! Just reboot.

-----------------------------------------------------------------------------

Loss of diffractometer control:

This is often seen as a failure of CBASS to start or to open the shutter or rotate an axis. CBASS will sometimes identify the problem at startup as "Diffractometer Communications Error".

The first attempt of fixing this can be to exit CBASS, cycle the power on the diffractometer, then restart CBASS. If that doesn't work, tail the logfile ~/compumotor_outfile.txt. (tail -f ~/compumotor_outfile.txt) Look for "timeout" errors. If they're present, confirm that the serial port cable connections on the control computer and the diffractometer are secure. The serial port used for diffractometer control is specified in $CONFIGDIR/cbass_env.txt as $GON_PORT_NAME. It is something like "/dev/ttyS1". If you see write errors in the logfile, do an "ls -l" on the serial port and make sure you have write permissions. Write permissions shouldn't disappear, but have done so after reboots after operating system upgrades. Here's an example of "ls -l /dev/ttyS1" output:

crwxrwxrwx    1 root     uucp       4,  65 Mar 24  2004 /dev/ttyS1

It's important to see the last "w", which means anyone can write there. As root, "chmod 777 /dev/ttyS1" would restore write permissions.
****After a power cycle most beamline need to home omega: type in the CBASS comman line 'home_omega'

------------------------------------------------------------------

-X29 Specific - Labview Monochromator Server

X29 ONLY! You may see a message on CBASS startup that says:

"ERROR connecting to CBASS server!" This can be caused by a number of things, but if nothing specific is identified, it's worth making sure the Labview monochromator server is running on x29-h.

Attempt #1

1) Stop and restart the macro on the LabView PC

2) Restart CBASS.

Attempt #2

1) Stop macro on the labview PC

2) Type: "start_lv_server" in a window on x29-h and wait for it to say

it's ready.

3) Restart macro on labview PC

4) Restart CBASS

--------------------------------------------------------------------------------

"EPICS motor Initialization Error"

The symptom is loss of beamline motor control. On an attempted restart of CBASS, you may see "Epics motor Init Error". The first thing to do is to confirm that EPICS is running and is serving the motors. The best way to do this is to probe a channel. This is done with a medm tool called "probe". On the "h" machine, type "probe:<beamline_id>:<real motor code>".  For a list of motors controlled by CBASS on each beamline, and the beamline id, you can look at $CONFIGDIR/epx.db. You can try "probing" any real motor in that file.
For example, on x12c, "probe x12c:mon" would pop up a medm screen that should have the current position of the monochromator (probably in degrees). The x29 monochromator is not controlled by EPICS, so you could do "probe x29a:tzo" to test reading a table motor. If the medm screen has no value, this would indicate a failure to communicate with EPICS. The best thing to do would be to reset the crate, give it some time, then reattempt the probe. If you get a reading, then you've probably fixed the problem, and you can restart CBASS. If there's still no reading, this is a more serious problem that is most likely network related.

------------------------------------------------------------

CBASS Hung after moving an Epics motor

There are instances in which EPICS thinks there is a motor moving when there isn't. This will hang CBASS. CBASS looks at an EPICS process variable named "[beamline_name]:alldone". This is an "anding" of all of the "dmov" (done moving) fields of all of the epics motors. If EPICS thinks a motor is moving, "alldone" will = 0, and CBASS will hang waiting for it to go to 1. Type "check_motors", and it will tell you the state of alldone, and the name of any moving motors.

The typical scenario is that you do something like a realign or a mono scan but the cbass prompt never returns. You can determine which motor is moving by opening up medm screens (motor8, motor8_2,...) until you find the motor that's stuck in the moving state. You can usually free it up with a small tweak. On some beamlines, there is a utility that you can execute on the unix command line on the "h" machine to diagnose this. 

"probe" is another useful facility. It comes with the EPICS distributions and allows you to look at any process variable. For example, on X25, typing "probe x25a:alldone" on x25-h would tell you if EPICS thinks there are any moving motors.
***Sometimes, some motors can be "moving" just because they are trying to get a readback from somewhere else.
For example X25a:mon needs to have a readback from the mgu.

-----------------------------------------------------------

Database Access Trouble

CBASS depends on a connection to PXdb. Should PXdb become unavailable for any reason, CBASS will not be able to collect data on any of the beamlines. This has only happened once. PXdb can be bypassed easily by editing $CONFIGDIR/cbass_env.txt and setting "HAS_DB" to 0, then restarting CBASS. This will allow operations to proceed while database access is addressed offline.

-------------------------------------------------

General Troubleshooting

CBASS maintains log files that are helpful in troubleshooting. On beamlines with Crystallogic diffractometers (all beamlines except x12b), "compumotor_outfile.txt" can be found in the current working directory or pxuser home directory on the "h machine". If you're having trouble related to diffractometer control, it's worth "tailing" it in a separate window while trying to start CBASS. Special attention should be given to "Timeout detected" messages. CBASS will not have control of the diffractometer if these messages are appearing. These are usually related to a failure of the Crystallogic program running on the compumotor, or a serial cable being disconnected.

In the directory in which you are running cbass, you can find "cbass_outfile.txt". This is the output of the cbass server. The things to look for here are crashes that produce a lot of output from Python. These stack dumps provide some idea of what is failing.

CBASS is fairly easy to troubleshoot if you understand what it does. When someone types "cbass" at a beamline, they execute a script. To find the script, type "alias cbass". This will tell you what is being called. In many cases, the result is "cbass2k4". Typing "which cbass2k4" will tell you where the script is. You can then use an editor to look at what CBASS does. Here's the CBASS script that runs on X12C:

#!/bin/csh -f                                                                   

source $CONFIGDIR/cbass_env.txt

killall -KILL tkmessage

killall -KILL gui.py

echo "kill cbass_server"

killall -KILL cbass_server.py

echo "kill gon_server"

killall -KILL compumot_serv.py

echo "kill detector_server"

killall -KILL q315_server

killall -KILL rsh

if ($HAS_BEAMLINE == 1) then

 killall -KILL cb_mon_mots.py

 killall -KILL cb_plot_monitor.py

 killproc xmgrace

 killproc ss_arch

 killall -KILL gv

 killall -KILL gs

 killall -KILL lynx

 killall -KILL chooch

endif

jpeg_engine $JPEG_ENGINE_PORT >> & /dev/null&

sleep 1

if ($HAS_BEAMLINE == 1) then

 $CBHOME/cb_mon_mots.py>>/dev/null&

 ss_arch -icon ~epics/expmnt/ioc/save_sets/x12c_saveall.cfg>>/dev/null&

 $CBHOME/cb_plot_monitor.py>>/dev/null&

endif

rsh -n $GON_HOST "$CONFIGDIR/startgon" &

sleep 4

if ($DETECTOR_OFFLINE == 0) then

 echo "Start ADSC Detector Servers."

 setenv Q4HOME /home/pxsys/q315

 source $Q4HOME/LOGIN_files/log_315

 /usr/local/bin/start_315_procs

 sleep 1

endif

sleep 10                                                     

$CBHOME/cbass_server.py>> & cbass_outfile.txt&

sleep 10

$CBHOME/gui.py

Most of the lines are associated with setup and environment. Note the lines in bold. These are the lines that start major components. In order, these are: diffractometer, detector, cbass server, and cbass gui. By carefully cutting and pasting (in order) from that script to the command-line on the "h" machine, you can usually identify what is causing the problem.

X29 USB/Serial Port Procedures

If you need to disconnect/reconnect the large USB/Serial Port box in the X29 rack, or cycle its power, follow the instructions below.

1. make sure the power is on to the IO Box

2. make sure that the white USB link is plugged in to the IO Box and to x29-h

3. "su -" to become root

4. run "./fixUSBports" and wait

5. logout of root and restart cbass.