Posts tagged "cone-of-shame":
Cutting and Pasting Doesn't Work Across Different File Types!
I know this post’s title isn’t a big revelation or anything. But this very error cost me two hours of frustration today. When I review other people’s code, or help them solve their problems, frequently I preach on how often problems lie in subtle details; as with all good advice, even the person that dishes it out sometimes needs to take it.
A couple weeks back I completed my ActiveMQ setup, with a secure database connection over TLSv1.2. Today, I was in the midst of my Alfresco CE setup, and I needed a secure database connection over TLSv1.2… in fact, a connection to the very same database server as with ActiveMQ, only to a different schema.
Being lazy, I cut-and-pasted the working connection string from the working ActiveMQ setup, to the in-progress Alfresco setup, and started Alfresco.
Alfresco blew up! “Access denied to user ABC@X.Y.Z”.
Here is where I went wrong… but in a good way. Instead of closely examining the working connection string (ActiveMQ) and the non-working connection string (Alfresco), I refactored some code and updated the TLS configuration routines. I ended up with a better understanding of how OpenSSL certificates and keys become Java trust stores and key stores, and a much improved Tomcat setup. All these good things I may or may not have gotten to, if my database connection had worked right away.
After all this work, Alfresco still threw up. “Access denied to user ABC@X.Y.X”.
Then it came to me… ActiveMQ’s connection string is in the file activemq.xml
… an XML file. Alfresco’s connection string is in the file alfresco-global.properties
… a properties file.
A connection string is a URL, with options separated by an ampersand; the &
character is reserved in XML, so you have to escape it.
This works fine in an XML file, not so much in a properties file:
jdbc:mysql://host:port/database?autoReconnect=true&useUnicode=true&...
Such was my working connection string in the XML file. That same string didn’t work so well in the properties file, where the ampersand is just another character, and needs no escaping.
This works fine in a properties file, not so much in an XML file:
jdbc:mysql://host:port/database?autoReconnect=true&useUnicode=true
Now both ActiveMQ and Alfresco are working quite well, thank you very much.
I’m ashamed to say after I first saw the problem, the next thing I tried was broken in both XML and properties:
jdbc:mysql://host:port/database?autoReconnect=true;useUnicode=true;
It goes without saying semicolons have no meaning in a URL, and this intermediate effort threw up just as hard as the first one. Some days are longer than others!
How Not to Manage Server Configs
After working in software a few years, most people end up with a rich store of knowledge; unfortunately, this knowledge is mostly about how to do things wrong. I know lots of ways to not manage a 200-person government software project, having been on several such beasts.
4 years into my current project (my longest tenure on a single project in my 30 years of work life), I know lots of ways to not manage a server farm. In this case (unlike the huge government software projects) I am culpable; I setup the system myself, either directly, or through guidance and code reviews to my programmers.
My (broken) system is based on Git branches:
core
: default branch, should work out of the box on developer installationsfoia
: branch for the Freedom of Information Act version of the prodctfoia-dev-server
: branch for the FOIA dev serverfoia-test-server
: branch for the FOIA test serverfoia-customer
: branch for one of our FOIA customersfoia-customer-dev-server
: branch for the customer dev serverfoia-customer-test-server
: branch for the customer dev server
core-customer
: branch for one of our core customerscore-customer-dev-server
: branch for core customer dev server
And so on, and so on.
Commits to a branch flow to all downstream branches, so commits to core
are applied to every branch; commits to foia
are applied to the downstream foia-*
branches; and so on.
Three facts make this a terrible system. Fact 1: the repository includes binary files (spreadsheets). Fact 2: my application updates many of these files at runtime. Fact 3: there is no hierarchy of configuration values; each branch is a complete copy of the entire configuration.
You may already see the obvious problem; such a system can’t be maintained automatically. The all-too-frequent merge conflicts cause me to spend much of my time caring and feeding the dev, test, and production systems. Merge conflicts in the spreadsheets are super-painful.
So what is to be done? How can I fix it?
First, I have to establish a separation of responsibilities, by setting up a hierarchy of configuration values. Default values (the core
branch in my example Git branch tree) are maintained by the core application; values for the FOIA extension are maintained only by the FOIA version of the application; customer-specific values are maintained in only one place; and runtime changes are kept separate. When the application needs a configuration value, it looks first for runtime values; then for server-specific values; then for extension values; then for core values… Obviously this is done by a library so application code just looks up a key, like it does now.
Second, establish a uniform representation. Everything is YAML; if it’s not YAML, it’s not configuration. Replace our spreadsheets with YAML, then as the application starts, load the YAML data into the same structures as we load the spreadsheet into now. Get rid of all the Spring configuration files; build the Spring beans from the YAML structure.
In this way we should get automated, conflict-free system administration.
Data Source Validation Queries Exist for a Reason!
My application uses Pentaho, a report generator written in Java; obviously Pentaho needs a data source, which allows it to connect to a database, which allows it to actually generate reports and thus to fulfill its role in life.
As everyone knows, sometimes database connections go bad; maybe the database restarted, or the network went down, or the phase of the moon changed, or some other weird problem happened.
The data source library can handle these issues for you. Before Pentaho issues its own query, the library can send a test query, and if the test query fails, the library will drop that connection, create a new connection, test that new connection, and finally Pentaho can use a known-good connection. Everyone is happy!
It turns out the test queries only happen if you ask for them. If you don’t configure the data source correctly, it never tests the connections, and Pentaho may end up using a bad connection forever, or until a client or your boss calls you, and you lose valuable minutes of your life restarting Pentaho.
Well, somehow I forgot to add this configuration to the automated installer for my program. Now I have a dozen sites to fix. Manually logging into a dozen sites, opening a config file, adding a few entries, saving the config file, and restarting Pentaho sounds like a lost afternoon to me.
Ansible to the rescue. I already have Ansible modules to update my sites. I only have to find out how Ansible can add entries to Pentaho’s context.xml
file:
- name: read data source config file command: cat /opt/pentaho/pentaho-server/tomcat/webapps/pentaho/META-INF/context.xml register: datasource changed_when: false - name: add validation query if necessary replace: path: /opt/pentaho-server/tomcat/webapps/pentaho/META-INF/context.xml regexp: '(.*)name="jdbc/myDataSource"(.*)' replace: '\1name="jdbc/myDataSource" validationQuery="select 1" \2' backup: yes when: "'name=\"jdbc/myDataSource\" validationQuery=\"select 1\"' not in datasource.stdout"
It means: read the data source config file; if the data source doesn’t already have a validation query, then add one. The when in the code is important, so as to achieve idempotence: the change is only applied if it needs to be applied.
After testing this on my local virtual machine, I ran it on my half-a-dozen development sites, and all was well.
Now all I have to do is meditate on my past failures, as to how something so obvious could have gone so wrong from the very start… how did I ever deploy a data source config with no query validation?