Data Source Validation Queries Exist for a Reason!
My application uses Pentaho, a report generator written in Java; obviously Pentaho needs a data source, which allows it to connect to a database, which allows it to actually generate reports and thus to fulfill its role in life.
As everyone knows, sometimes database connections go bad; maybe the database restarted, or the network went down, or the phase of the moon changed, or some other weird problem happened.
The data source library can handle these issues for you. Before Pentaho issues its own query, the library can send a test query, and if the test query fails, the library will drop that connection, create a new connection, test that new connection, and finally Pentaho can use a known-good connection. Everyone is happy!
It turns out the test queries only happen if you ask for them. If you don’t configure the data source correctly, it never tests the connections, and Pentaho may end up using a bad connection forever, or until a client or your boss calls you, and you lose valuable minutes of your life restarting Pentaho.
Well, somehow I forgot to add this configuration to the automated installer for my program. Now I have a dozen sites to fix. Manually logging into a dozen sites, opening a config file, adding a few entries, saving the config file, and restarting Pentaho sounds like a lost afternoon to me.
Ansible to the rescue. I already have Ansible modules to update my sites. I only have to find out how Ansible can add entries to Pentaho’s context.xml
file:
- name: read data source config file command: cat /opt/pentaho/pentaho-server/tomcat/webapps/pentaho/META-INF/context.xml register: datasource changed_when: false - name: add validation query if necessary replace: path: /opt/pentaho-server/tomcat/webapps/pentaho/META-INF/context.xml regexp: '(.*)name="jdbc/myDataSource"(.*)' replace: '\1name="jdbc/myDataSource" validationQuery="select 1" \2' backup: yes when: "'name=\"jdbc/myDataSource\" validationQuery=\"select 1\"' not in datasource.stdout"
It means: read the data source config file; if the data source doesn’t already have a validation query, then add one. The when in the code is important, so as to achieve idempotence: the change is only applied if it needs to be applied.
After testing this on my local virtual machine, I ran it on my half-a-dozen development sites, and all was well.
Now all I have to do is meditate on my past failures, as to how something so obvious could have gone so wrong from the very start… how did I ever deploy a data source config with no query validation?