Posts tagged "lessons-learned":

17 Oct 2018

How Not to Manage Server Configs

After working in software a few years, most people end up with a rich store of knowledge; unfortunately, this knowledge is mostly about how to do things wrong. I know lots of ways to not manage a 200-person government software project, having been on several such beasts.

4 years into my current project (my longest tenure on a single project in my 30 years of work life), I know lots of ways to not manage a server farm. In this case (unlike the huge government software projects) I am culpable; I setup the system myself, either directly, or through guidance and code reviews to my programmers.

My (broken) system is based on Git branches:

core: default branch, should work out of the box on developer installations
- foia: branch for the Freedom of Information Act version of the prodct
  - foia-dev-server: branch for the FOIA dev server
  - foia-test-server: branch for the FOIA test server
  - foia-customer: branch for one of our FOIA customers
    - foia-customer-dev-server: branch for the customer dev server
    - foia-customer-test-server: branch for the customer dev server
- core-customer: branch for one of our core customers
  - core-customer-dev-server: branch for core customer dev server

And so on, and so on.

Commits to a branch flow to all downstream branches, so commits to core are applied to every branch; commits to foia are applied to the downstream foia-* branches; and so on.

Three facts make this a terrible system. Fact 1: the repository includes binary files (spreadsheets). Fact 2: my application updates many of these files at runtime. Fact 3: there is no hierarchy of configuration values; each branch is a complete copy of the entire configuration.

You may already see the obvious problem; such a system can’t be maintained automatically. The all-too-frequent merge conflicts cause me to spend much of my time caring and feeding the dev, test, and production systems. Merge conflicts in the spreadsheets are super-painful.

So what is to be done? How can I fix it?

First, I have to establish a separation of responsibilities, by setting up a hierarchy of configuration values. Default values (the core branch in my example Git branch tree) are maintained by the core application; values for the FOIA extension are maintained only by the FOIA version of the application; customer-specific values are maintained in only one place; and runtime changes are kept separate. When the application needs a configuration value, it looks first for runtime values; then for server-specific values; then for extension values; then for core values… Obviously this is done by a library so application code just looks up a key, like it does now.

Second, establish a uniform representation. Everything is YAML; if it’s not YAML, it’s not configuration. Replace our spreadsheets with YAML, then as the application starts, load the YAML data into the same structures as we load the spreadsheet into now. Get rid of all the Spring configuration files; build the Spring beans from the YAML structure.

In this way we should get automated, conflict-free system administration.

16 Oct 2018

Code Reviews Make for Social Programming

A long time ago on a project long dead, I setup a code review system. At that time I was naive: the code review started after the code deployed to the development server. If it seemed to work, the project manager was happy, and no one really cared about the code reviews. So I switched our project from Subversion to Git, and setup a git-flow model, where code lands in feature branches and must be reviewed before it gets deployed anywhere.¹

As I am not a project manager or a software development manager or a scrum master, I have limited influence over how my programmers actually work, but I do exercise despotic control over the code itself. This one change (enforcing code reviews) is the best thing I ever did in terms of code quality.

First, it lets everyone learn from each other. I learned about Java 8 streams and lambdas from reviewing other people’s code. Some of the other guys have learned from my comments.

Second, sometimes we catch real issues before the code lands. At least twice in the last two weeks, code reviews revealed a completely wrong path taken by the programmer, or a complete misunderstanding of the requirements. Now, it would be really nice if we caught these issues even earlier, like before the programmer invests days going down the wrong path. Still, better catch it in the code review, than after customer deployment.

Obviously the reviews are only as good as the reviewers. On another project in my company, “code reviews” consist of reviewers checking out the feature branch, running it locally, and seeing if it works… so they are just duplicating the work of the system tester. They are not reviewing for correctness, coding standards, security, efficiency, architectural adherence, etc.

Other times we review according to our history with the programmer. I trust some of my programmers more than others. This week, my most senior programmer sent me a short code review. I reviewed it very quickly and accepted it. During the build some unit tests failed! The test failures were from really sloppy code this programmer ordinarily would never have turned in, and that I would never have accepted, if it came from a more junior guy that I didn’t trust so much.

Good judgment in code reviews is a precious commodity. A couple of my programmers are very good and thorough code reviewers. If I had another couple reviewers like that, our product would be much better.

Footnotes:

Note that git-flow has its detractors: https://hackernoon.com/gitflow-is-a-poor-branching-model-hack-d46567a156e7

11 Oct 2018

Know the Platform

Earlier today I wrote about the pain of learning a new toolkit. Afterward it occurred to me I never invested in IDEA or Eclipse, in the way I’m being forced to invest in Emacs. That made me think of all the other tools I use: Spring, JPA, Java, JUnit, integration testing… I know a little about lots of things… but only enough to get through my daily assignments.

Focusing on getting through my daily assignments, without a deep understanding of at least 3 - 4 of the really key tools, makes me a corporate drone, not a professional software developer.

Is being a drone really bad? I do value keeping a roof over my head, and of course my first loyalty is to my employer. But, part of what my employer values me for is in fact a certain level of professionalism, more than I’ve shown so far.

How to transition from drone to professional?

First: continue to keep my employer happy!

Second: learn:

Toolchain: Emacs, Java, Ansible
Platform: Spring Boot, Kubernetes
Practices: specification, testing
Patterns: Domain-driven design

Third: contribute. Give back to at least one of the above communities… contribute some code, answer questions on Stack Overflow, something!

Fourth: practice. Start a side project… and finish it!

A lot to take on, to be sure. I’ll post back here on my progress.

08 Oct 2018

Samba Over ApacheDS Over OpenLDAP

My application has internal users and groups, with role-based access control based on group membership… in other words, it’s a plain old corporate business application, not a retail app or a social network where anyone can sign in with Google, Facebook, Amazon etc.

So I need a directory server. In the dawn of time (a few years ago) I gave my installation team more freedom to choose their own components… Since our target platform is Linux, they chose OpenLDAP.

OpenLDAP is a major pain in the ass to configure to our requirements. I wrote ever so many lines of shell script to add a partition (e.g. arkcase.com); enable TLS; add three or four LDAP schema elements required by my application, but not loaded by default into OpenLDAP; and change the default admin password.

All this was on CentOS 6. Well, when CentOS 7 came out, all these scripts stopped working! The CentOS 7 OpenLDAP packages broke something. I didn’t have enough guts to wade back into the code.

Where could we go from OpenLDAP? Since my application is written in Java, next we chose ApacheDS. Still more oodles of shell script required! In ApacheDS, the official documentation to add a partition says to use a GUI tool! The only way to automate it is to use the tool once, record the LDIF it generates, and write script code to generate the same LDIF.

And it didn’t take long before we had real problems. One of our teams had to load 5,000 test users; ApacheDS choked on the load script; too much traffic all at once. Throw in random exceptions and occasional corrupted data stores, and we were ready to move on again.

Since pretty much all our customers are on Active Directory, we really wanted an Active Directory-compatible directory server that is free and runs on Linux. Actually we had these three requirements (AD-compatible, free, Linux) this entire time, so the fact it took me so long to find Samba 4 just goes to show what a bad architect I am.

The guide looks intimidating, but the process couldn’t be simpler, especially if you only need Active Directory compatibility in terms of directory services (not so much DNS, file shares, Kerberos, certificate management…). All my oodles of shell script boil down to 150 lines of Ansible directives, most of which setup a folder structure. After configure ; make ; make install I have a working Active Directory compatible server, complete with partition, TLS support, and my desired administrator password.

The lesson is simple, but I seem to forget it just as often as I learn it: always go for the service that most closely meets your requirements. Samba is free; runs on Linux; compatible with Active Directory; and easy to install and configure.