
CBASS
Troubleshooting Guide
This document is intended to help beamline staff who are actively
troubleshooting a problem with CBASS. We'll look for easy fixes first,
then dig deeper if necessary. This document assumes that you have tried
to restart CBASS in order to fix the problem, and that has failed.
CBASS controls a diffractometer, detector, beamline_motors, an optional crystal mounting robot, and it interacts with the PXDB database.
Most problems with CBASS can be
traced back to either detector or diffractometer control. In less
frequent circumstances there is a networking glitch or a problem with
beamline motor control (EPICS). It is strongly advised to call a
px-operator to assist in solving the issue.
Specific situations:
"hanging on dark" or "hanging on readout":
This is typical of a detector communications problem. On beamlines with ADSC detectors, CBASS communicates with adsc processes (e.g. det_api_workstation, ccd_image_gather, ccd_xform) that typically run on the same machine that CBASS runs on . These processes then communicate with a program that runs on the detector control PC's that are physically connected to the detector. In most cases, a detector control failure is caused by a problem with a detector control PC or the program running on it.
ADSC detector hangs:
Attempt #1
1) Exit CBASS
2) Type "kill_detector" in an "h-machine" window
3) Restart CBASS
Attempt #2
1) Exit CBASS
2) Type "kill_detector" in an "h-machine" window
3) Using the monitor switch on the ADSC rack, cycle through the screens for each detector control computer and see if one of them looks different than the others. You may find that one has crashed. If so, then kill and restart the "remote detector operations" program for the one that looks different.
4) Restart CBASS
Attempt #3
If a problem with a particular PC isn't obvious in Attempt #2, execute "kill_detector", then cycle through the screens again and kill/restart all of the "remote detector operations" programs. Then restart CBASS.
Attempt #4
This should be pretty rare. Same as "Attempt #3", but reboot all of the detector PCs before restarting the remote ops programs. DO NOT POWERCYCLE THESE! Just reboot.
-----------------------------------------------------------------------------
Loss of diffractometer control:
This is often seen as a failure of CBASS to start or to open the shutter or rotate an axis. CBASS will sometimes identify the problem at startup as "Diffractometer Communications Error".
The first attempt of fixing this can be to exit CBASS, cycle the power on the diffractometer, then restart CBASS. If that doesn't work, tail the logfile ~/compumotor_outfile.txt. (tail -f ~/compumotor_outfile.txt) Look for "timeout" errors. If they're present, confirm that the serial port cable connections on the control computer and the diffractometer are secure. The serial port used for diffractometer control is specified in $CONFIGDIR/cbass_env.txt as $GON_PORT_NAME. It is something like "/dev/ttyS1". If you see write errors in the logfile, do an "ls -l" on the serial port and make sure you have write permissions. Write permissions shouldn't disappear, but have done so after reboots after operating system upgrades. Here's an example of "ls -l /dev/ttyS1" output:
crwxrwxrwx 1 root uucp 4, 65 Mar 24 2004 /dev/ttyS1
It's
important to see the last "w", which means anyone can write there. As
root, "chmod 777 /dev/ttyS1" would restore write permissions.
****After a power cycle
most beamline need to home omega: type in the CBASS
comman line 'home_omega'
------------------------------------------------------------------
-X29 Specific - Labview Monochromator Server
X29 ONLY! You may see a message on CBASS startup that says:
"ERROR connecting to CBASS server!" This can be caused by a number of things, but if nothing specific is identified, it's worth making sure the Labview monochromator server is running on x29-h.
Attempt #1
1) Stop and restart the macro on the LabView PC
2) Restart CBASS.
Attempt #2
1) Stop macro on the labview PC
2) Type: "start_lv_server" in a window on x29-h and wait for it to say
it's ready.
3) Restart macro on labview PC
4) Restart CBASS
--------------------------------------------------------------------------------
"EPICS motor Initialization Error"
The
symptom is loss of beamline motor control. On an attempted restart of
CBASS, you may see "Epics motor Init Error". The first thing to do is
to confirm that EPICS is running and is serving the motors. The best
way to do this is to probe a channel. This is done with a medm tool
called "probe".
On the "h" machine, type
"probe:<beamline_id>:<real
motor code>". For a list of
motors controlled by CBASS on each beamline, and the beamline id, you
can look at $CONFIGDIR/epx.db. You can try "probing" any real motor in
that file.
For example, on x12c, "probe x12c:mon" would pop up a medm
screen that should have the current position of the monochromator
(probably in degrees). The x29 monochromator is not controlled by
EPICS, so you could do "probe x29a:tzo"
to test reading a table motor.
If the medm screen has no value, this would indicate a failure to
communicate with EPICS. The best thing to do would be to reset the
crate, give it some time, then reattempt the probe. If you get a
reading, then you've probably fixed the problem, and you can restart
CBASS. If there's still no reading, this is a more serious problem that
is most likely network related.
------------------------------------------------------------
CBASS Hung after moving an Epics motor
There
are instances in which EPICS thinks there is a motor moving when
there isn't. This will hang CBASS. CBASS looks at an EPICS process
variable named "[beamline_name]:alldone". This is an "anding" of all of
the "dmov" (done moving) fields of all of the epics motors. If EPICS
thinks a motor is moving, "alldone" will
= 0, and CBASS will hang
waiting for it to go to 1. Type "check_motors",
and it will tell you the state of alldone, and the name of any moving
motors.
The typical scenario is that you do something like a realign or a mono scan but the cbass prompt never returns. You can determine which motor is moving by opening up medm screens (motor8, motor8_2,...) until you find the motor that's stuck in the moving state. You can usually free it up with a small tweak. On some beamlines, there is a utility that you can execute on the unix command line on the "h" machine to diagnose this.
"probe"
is another useful facility. It comes with the EPICS distributions and
allows you to look at any process variable. For example, on X25,
typing "probe
x25a:alldone" on x25-h would tell you if EPICS thinks
there are any moving motors.
***Sometimes, some motors
can be "moving" just because they are trying to get a readback from
somewhere else.
For example X25a:mon needs to have a readback from the mgu.
-----------------------------------------------------------
CBASS depends on a connection to PXdb. Should PXdb become unavailable for any reason, CBASS will not be able to collect data on any of the beamlines. This has only happened once. PXdb can be bypassed easily by editing $CONFIGDIR/cbass_env.txt and setting "HAS_DB" to 0, then restarting CBASS. This will allow operations to proceed while database access is addressed offline.
-------------------------------------------------
CBASS maintains log files that are helpful in troubleshooting. On beamlines with Crystallogic diffractometers (all beamlines except x12b), "compumotor_outfile.txt" can be found in the current working directory or pxuser home directory on the "h machine". If you're having trouble related to diffractometer control, it's worth "tailing" it in a separate window while trying to start CBASS. Special attention should be given to "Timeout detected" messages. CBASS will not have control of the diffractometer if these messages are appearing. These are usually related to a failure of the Crystallogic program running on the compumotor, or a serial cable being disconnected.
In the directory in which you are running cbass, you can find "cbass_outfile.txt". This is the output of the cbass server. The things to look for here are crashes that produce a lot of output from Python. These stack dumps provide some idea of what is failing.
CBASS is fairly easy to troubleshoot if you understand what it does. When someone types "cbass" at a beamline, they execute a script. To find the script, type "alias cbass". This will tell you what is being called. In many cases, the result is "cbass2k4". Typing "which cbass2k4" will tell you where the script is. You can then use an editor to look at what CBASS does. Here's the CBASS script that runs on X12C:
#!/bin/csh -f
source $CONFIGDIR/cbass_env.txt
killall -KILL tkmessage
killall -KILL gui.py
echo "kill cbass_server"
killall -KILL cbass_server.py
echo "kill gon_server"
killall -KILL compumot_serv.py
echo "kill detector_server"
killall -KILL q315_server
killall -KILL rsh
if ($HAS_BEAMLINE == 1) then
killall -KILL cb_mon_mots.py
killall -KILL cb_plot_monitor.py
killproc xmgrace
killproc ss_arch
killall -KILL gv
killall -KILL gs
killall -KILL lynx
killall -KILL chooch
endif
jpeg_engine $JPEG_ENGINE_PORT >> & /dev/null&
sleep 1
if ($HAS_BEAMLINE == 1) then
$CBHOME/cb_mon_mots.py>>/dev/null&
ss_arch -icon ~epics/expmnt/ioc/save_sets/x12c_saveall.cfg>>/dev/null&
$CBHOME/cb_plot_monitor.py>>/dev/null&
endif
rsh -n $GON_HOST "$CONFIGDIR/startgon" &
sleep 4
if ($DETECTOR_OFFLINE == 0) then
echo "Start ADSC Detector Servers."
setenv Q4HOME /home/pxsys/q315
source $Q4HOME/LOGIN_files/log_315
/usr/local/bin/start_315_procs
sleep 1
endif
sleep 10
$CBHOME/cbass_server.py>> & cbass_outfile.txt&
sleep 10
$CBHOME/gui.py
Most of the lines are associated with setup and environment. Note the lines in bold. These are the lines that start major components. In order, these are: diffractometer, detector, cbass server, and cbass gui. By carefully cutting and pasting (in order) from that script to the command-line on the "h" machine, you can usually identify what is causing the problem.
X29 USB/Serial Port Procedures
If you need to disconnect/reconnect the large USB/Serial Port box in the X29 rack, or cycle its power, follow the instructions below.
1. make sure the power is on to the IO Box
2. make sure that the white USB link is plugged in to the IO Box and to x29-h
3. "su -" to become root
4. run "./fixUSBports" and wait
5. logout of root and restart cbass.