The consultant that designed our databases did an awful job. There's really no question about it. Just an awful, awful job. I feel like this job is some sort of anger management for me, because at the same time as I want to just take all of these computers and throw them outside the window with the consultant attached to them, that would be pretty psycho if I actually did that. Instead, I silently bang my head against the wall in my fantasies and just smile whenever someone is asking how my job is going. This is the work of some seriously demented and screwed up consultant that has no business, whatsoever, in IT. There are myriads of problems here, but I'll try to lay out the basic lines of what kind of work I have before me.
Let me start by describing what the databases are for: our scientists are conducting research related to rheumatic arthritis. In order to do this, they use this questionnaire. The questionnaire is about 37 pages, with all kinds of questions. I have quite a few thoughts about the questions themselves, but it's not my job to mess with that. What I need to do, is to develop some way for these people to take a questionnaire and put it into digital form. The whole idea is simple, really. You have lots of questions and the respondents answers are going into a database (or, as in this case, into myriads of different databases). Now, in an ideal world, with all these automated IT-solutions there would be a machine that would scan these questionnaires and do this for them. In the real world though, there simply isn't room in the budget for that kind of wizardry. So, let's take the next best solution, develop software for them to input the data into a big database. This is where I, and before me, the consultant comes in. This is what I do, this is my job.
Now, the consultant has done this already. But, with quite a few caveats that just isn't acceptable. Not by any means, imo. I feel like this consultant is a charlatan, trying to pass off as a database developer, that makes so many fundamentally wrong mistakes that he wouldn't have passed through our courses at university. So, what I need to do, is to develop a new database and software to input data, and migrate the old data. This would be a lot easier if there was one, or two, access databases with data. But there isn't. There is, at the time of writing, a total of 205 access databases, that are for the most part based on the same database, with a couple of variances such as empty, non-used failed tables (yes, he has experimented and left them there to rot), example-tuples (rows), non-used example attributes (columns) and so on and so forth. A lot of these access-databases contain fairly much of the same data, but not everything.
Most of the database files are just purely redundant copies of copies and to no purpose at all!
Most of the columns used for data, are named according to the questionnaire. The names are not AmntofCigs to refer to the amounts of cigarettes smoked, but merely F14 to refer that it's question 14 in the F section that is being answered. Now, I can understand his thoughts in doing this - this would save him some time and well, at first it would seem logical. However, it doesn't save time, and it isn't logical nor intuitive. It takes me forever to find what these different attributes mean, and where they are in the questionnaire (remember, it's a 37 page questionnaire with lots and lots of questions). The thing is that these questionnaires comes in different revisions, as time pass by and these people decides to scrap some questions, change some, move some, add new questions and so on. This research has been going on since 1996, so you can imagine that there needs to be some changes made. This also means that if a questions, for instance no. 14 from F, changes to become question 10 in the G section instead, you suddenly have a F14 column in a table for F mapped to a question in the G section of the questionnaire. This has happened, a lot. What happends when there is a new question 14 in the F section now? Well, the column can't be named F14 because there already is such a column. The new name of the new column becomes F14_2. Or so my predecessor thought. This has also happened. I have to deal with attributes and variables named F14x2_4 and so on.
The usage of tables is bad. It's so bad I just want to bang my head against the keyboard instead of the wall now. The consultant thought it would be a great idea to separate the tables according to the letters in each section, and stay there. I can understand him wanting to do that, and I can see myself doing somewhat of the same principle. But with better names for attributes. But this smeghead didn't think of creating sub-tables for questions that has more than one answer, i.e.
"What jobs have you had?". Normally, what you do, is you create a sub-table for that and map it to the respondent via primary keys and foreign keys. Oh, but noooo, that would be far too clever a solution for this smeghead. Instead, he created attributes, or again, columns for each job. So the table for section X has columns FOR EACH JOB that a respondent has had. Let's put this is an explanatory example for you:
- A job has a start date
- A job has an end date
- A job has a description
- A job has an employer
This is an incredibly simplified example, because in the real world, with this questionnaire, there are about 30 variables for each job. Let's call the start date
startDate (to follow the camelBack convention), ending date
endDate, job description
jobDescr and employer simply
empl. To add to that, you have the respondent ID which is the key used to map to the other tables, which we'll call rID. A really, really simple approach here. Ideally a table would look like this:
rID, startDate, endDate, jobDescr, empl
0 1990 2010 123 321 (Where 123 is a code for a certain job description found in another table, and the same principle for the employer)
This solution creates the opportunity of one person having as many different jobs as possible, since all it takes to add a job, is to add another row with the same rID.
This smeghead didn't do that. What he did was this:
rID, startDate1, endDate1, jobDescr1, empl1, startDate2, endDate2, jobDescr2, empl2, startDate3, endDate3, jobDescr3, empl3
...but instead of having 3 jobs, there are 13, and instead of 4 different attributes, there are 30, and instead of having one table solely for this, he put these columns into a whole table with columns for other questions as well, and instead of using intuitive naming conventions for his attributes, he used AKSDHFKAFSDH2311412341234 for his columns. This ended up too big for one table, so he divided this into several tables, with all of the different columns named F14x_2, F56, Fx2_3, and so on and so forth - with no thought behind this, what so ever.
This is what I do, this is what I need to clean up.