Embedded Systems Software, Computer Networking and Geeky Fun

nerd1951.com

September 30, 2008

Hacker’s food

Filed under: Tools, Geeky Fun, Rants, Programming — harvey.sugar @ 4:23 pm

Some things that help make a normal life pleasant can get to be distractions when you’re way behind schedule on a project. Nerds are legendary for ignoring these distractions when a technical challenge requires their full attention. Details like hygiene and nutrition are the first casualties in battles against bugs and deadlines. It’s really hard to be fresh smelling and perky looking when you’re in the middle of marathon systems integration problems and have been working for eighteen hours straight.

Over the last couple of weeks, I’ve really noticed a decline in my eating habits. I usually try to prepare my food from fresh unadulterated ingredients; lots of vegetables, beans, salad and grains and a good bit of meat too. But I cook from scratch and watch the carbs and fat. That is until I hit systems integration.

I started out last week with salads for lunch and home made chili for dinner. When the chili and the lettuce ran out, I switched to carry out food. I tried sticking to wholesome stuff like the local kabob place, easy on the rice and Chipotle which is quite healthy and tasty without the rice and tortillas.

Then I started eating at odd hours, late at night or very early in the morning. I switched to the diet of the legendary first generation of hackers at MIT, like Richard Stallman. I started alternating between Chinese carry out and pizza washed down with lots of Coke.

I knew I hit bottom this morning. Taco Bell is open late around here. They call the time between midnight and two AM, “The Forth Meal.” I found myself driving to the Taco Bell to get there before closing so I could get mine. A few hours later I was at McDonalds’ getting a sausage, egg, and cheese McGriddle and another Coke.

I’m lost now and I’ll admit it. I’m not eating again until I can cook something for myself. Right after a shower and a twelve hour nap.

• • •
 

September 24, 2008

It’s (almost) never the hardware

Filed under: Rants, Programming — Harvey @ 10:47 pm

We embedded systems programmers have some special challenges in our work.  One of the major problems we face is that the hardware we work with is often not fully exercised until our software exercises it.  So, it is tempting to blame the hardware when things don’t work right and the cause is not obvious.  I’ve done my share of blaming the hardware, sometimes to the point of embarrassment when it turned out I was wrong.

Over the years I’ve learned to work with the hardware engineers to solve a problem rather than point my finger.  I’ve learned the hard way to devise some low level tests to isolate a suspected hardware problem before I go bother the hardware designer.  Sometimes a ’scope or a logic analyzer are the best software debugging tools and embedded systems programmer can have.  But sometimes you run across a bizarre software problem that masquerades as an obvious hardware problem.  A couple of these kinds of bugs will change your approach to debugging forever.

Will Rogers once said “There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves.”  For those who learn by reading or observation here are a couple of war stories about obvious hardware problems that weren’t.

The first story involves a very successful piece of test equipment.  If you ever worked with T1s and I told you the model number you would probably recognize it.  I worked on this product toward the end of its life cycle.  I did some of the last feature upgrades and became responsible for software maintenance on it.  This test set used a popular Intel counter-timer chip.  The same chip was used in the original IBM PC and there are incarnations of it in the system chips of personal computers to this day.  NEC was a second source for this part but for some reason this test set would not work with the NEC version of the timer chip.  There was even a note on the BOM (Bill of Materials) that only the Intel part could be used.

Well, one day, something fell through the cracks and a batch of these units were built with the NEC part.  I got a call from production test that the software was failing in these units.  So, I went over to the factory and picked up one of the failing units to look at with an ICE.  As soon as I took the unit apart, I saw the NEC timer and knew that was the problem but when I called the production manager, he insisted that I should take a second look at the software.  We used the same NEC time in several other products with no problems and it was getting harder to get the Intel part.  I was certain that it was not a software problem.  There were literally hundreds of these test sets in the field with this software, working just fine.  I couldn’t very well say no so I set up a a couple of breakpoints and ran the unit trough its paces.  As I single-stepped through the ISR for the timer for about the tenth time, I noticed that it pushed one more register on the stack than it popped off at the end.  There it was in x86 assembly code, a function that should have crashed every time no matter what timer chip was used.  I fixed the routine and the test set worked just fine with the NEC part from then on.  To this day I have no idea why the problem never showed up with the Intel timer chip but if it weren’t for the persistence of that production manager, that bug might have showed up at another time in another way.

The second story is about another very popular product from the same company.  I never worked on that product but a coworker told me about this one.  Test equipment tends to have a very long product lifetime.  This product had undergone so many upgrades that they decided that it was time for a major software rewrite.  The project went well until system testing.  It seemed that the units in the lab worked just fine but when they were buttoned up with the covers on, the software crashed.  How could the presence of the cover affect the software?  It seemed like a classic hardware problem.  Perhaps it was a problem with noise or temperature.  The mystery was that the previous software release worked just fine, with or without the covers on.  Debugging this was a nightmare.  You couldn’t hook up an ICE with the cover on the unit so they had to write test code and install it and put the cover back on.  There didn’t seem to be any rhyme or reason to the software crashes.  Meanwhile they were also pursuing another seemingly unrelated problem.  If two of the pins on the RS-232 interface were shorted together (like RTS to CTS) the software would crash as soon as the test set was powered up.  It turned out that in the start up code in the new version, someone enabled interrupts before the rest of the hardware was initialized.  Shorting the RS-232 pins together or putting the cover on the unit added just enough noise on a floating interrupt line to cause it to trigger an interrupt.  Since the interrupt vector table had not been initialized at this point, the interrupt vectored to an invalid address and crashed the system.  Yep, another classic hardware problem that turned out to be software.

So what should you take away from these stories?  Never assume the problem is hardware.  If it seems like it is hardware, work with the hardware engineer to solve the problem.  Don’t just point your finger.  Use a ’scope or a logic analyzer if you know how and if you don’t get an EE to drive for you.  Some very bizarre software bugs can disguise themselves as hardware problems.

• • •
 

September 12, 2008

Geeky Humor and the Large Hadron Collider

Filed under: News, Geeky Fun — Harvey @ 2:12 pm

The Large Hadron Collider has been in the news a lot lately as they begin the process of bringing it online. Any physics experiment on this scale is bound to inspire some geeky humor (I don’t believe that’s an oxymoron.) Here is a sampling of some of what I’ve found on the net:

First the question on the minds of many,
Has the Large Hadron Clollider Destroyed the earth yet?

Next is the LHC Rap. This is actually pretty well produced and explains what they are hoping to accomplish. When I watch this though, I can’t help but think that all those people dancing around probably have PHDs and learned more math by the time they were twelve than I’ll ever know.

And finally, you can always count on XKCD as a source of relevant (pardon the pun) nerdly humor:

I also found a funny play on words or two but they’re not for a non-X rated website. I’ll leave those to your imagination. A Google search for them is an optional exercise for the reader.

• • •
 

September 9, 2008

Open source voting machines?

Filed under: News, Rants — Harvey @ 9:38 pm

Every once in a while I go over to techdirt for some interesting reading while I’m “waiting for something to compile.” They’ve had a number of posts about e-voting and voting machines in general. One in particular caught my eye: Palm Beach County Lost 3,400 Votes; Claims Different Sequoia Scanners Count Differently. One of the issues that they raised is that it is difficult for local jurisdictions to verify the operation of voting machines because the manufacturers have been highly resistant to independent inspection of the systems.

In another article on techdirt, E-Voting Isn’t Perfect, But It Takes Less Work to Corrupt Big Elections, they explain why e-voting makes it so much simpler to rig elections. The article references a paper, Voting System Risk Assessment via Computational Complexity Analysis by Rice computer scientist Dan Wallach. Being the nerd that I am, I downloaded the paper and read it. It’s pretty scary if you believe in democracy.

I’m sure that the technical community can do better than this. If we don’t want companies like Diebold and Premier counting our votes using flawed equipment and security features, I think we need to do something about it ourselves. I think we should form an organization to develop an open source voting machine. I’m sure there are some very talented computer security researchers out there who could design procedures that would make vote tampering nearly impossible. Just look at OpenSSL which is used for thousands of financial transactions every day. I also think that the open source community is better at developing reliable software  than for-profit enterprises that are more interested in minimizing cost than maximizing value.

An open source voting machine would have the following advantages:

  • The complete design, hardware, software, procedures, would be open for inspection by any government body or independent panel.
  • The project could attract the best and brightest in computer and systems security to ensure state of the art fraud prevention.
  • The project would also attract top talent in hardware and software development.
  • Local government agencies could have competitive bidding for the manufacture of the voting machines. The machines could even be made available to budding democracies in other countries.
  • An open source community would be more willing to admit to flaws and mistakes and correct them.
  • If fraud is suspected the open source community would be available to analyze the evidence.

I don’t know if this is a crackpot idea or a chance for us tehies to help preserve democracy. I don’t know, there may already be a project like this going. I would be interested in what others think of this idea and if anyone might even want to work on a project like this.

• • •