Technology

File Type Manager source code

Due to the renewed popularity of File Type Manager (mainly because of Windows Vista) I've decided to make the source code available. Maybe somebody else feels like working on this program some more. Keep in mind that I wrote this when in high school and just learning how to program, so it probably isn't the greatest code. Also it's written in Visual Basic 6. Ugh.

File Type Manager 2.0.1 Source Code

Note that I've licensed it under the LGPL. It includes an ActiveX control that displays the file types, so you could re-use that somewhere else if you wanted to.

Working Part-Time, Moving, Starting Web 2.0 Project

I thought I would write a quick blog entry to update everyone on the latest happenings...

First off, since November 1st I'm only working part-time at GenoLogics. I'm spending two days a week working on a personal project. I came up with (what I think is) a really good idea for a Web 2.0 project. So I've cast away the chains of J2EE and I'm working with PHP and Drupal to create a nice Web 2.0 site. Yes, it will have all that AJAX goodness that the modern geek (user) is accustomed to.

Why PHP/Drupal you ask? Well, I did look into Ruby on Rails and also some Python frameworks. The thing is that I know PHP/Drupal very well, so I can be productive very quickly. At this point I just didn't want to invest the time to learn a new framework. Also, Ruby on Rails just didn't really turn me on, although granted I spent very little time looking at it. The thing is with Drupal I get so much infrastructure that is already provided for me: security, comments, user profiles, theming, page generation, etc. I'm not sure why I would want to use Rails and roll it all for myself. Having the support of a large Drupal user community backing up your infrastructure is also a big plus.

Anyway, the next thing is that I'm moving to Vancouver on December 1st. I'm getting a little bored in Victoria and also I think that the technology scene in Vancouver will be better. I'm looking forward to check out the Drupal and PHP user groups. Finding a place to live in Vancouver was pretty tough, but in the end I found a nice 1 bedroom in Kitsilano. So I guess I'm all set. :-)

On another note: GenoLogics is hiring. You should apply. It's a good place.

Open a File in the Default Application using the Windows Command Line (without JDIC)

Quite a few people have asked me about this in the past. If you have a file how can you open it in the default associated application without querying the registry or using some other Windows API? Or if you program in Java how can you do it without using JDIC?

The easiest way to do this is using the "start" command. For example to open the file "readme.txt" in the default text editor you would do this:

C:\>start readme.txt

You can also use start to open folders or follow shortcuts:

C:\>start "My Shortcut"    <-- note that you don't need .lnk at the end

This will open the target of the "My Shortcut" shortcut. If the shortcut points to a folder it will open a Windows Explorer window for it, if the shortcut points to a document it will open it in the default application and if the shortcut is for a program it will launch the program.

The trick is that "start" isn't an executable. It is a built-in command of the Windows command line interpreter "cmd.exe". In Java (and other languages) if you try to create a process using the "start" command this will fail -- since there is no "start.exe" executable in the system.

Instread you have to invoke "start" through the "cmd.exe" interpreter. This can be done using the /C flag:

cmd /c "start readme.txt"

This can be run successfully in Java using Runtime.exec() or a ProcessBuilder. Simply calling "start" directly would fail. Note that this limitation is also true for many other Windows commands. If something fails to invoke you should always try running it using "cmd /c".

JBossMQ JMS over HTTP performance gotchas

Here's a lesson I learned recently: Don't use JMS over HTTP if you want to have anything close to high-throughput, at least not if you are using JBossMQ. This may also be the case for other providers, depending on their HTTP client implementation.

And here's why: When a JMS over HTTP client is subscribed to a JMS destination it is actually polling the server. It will connect, receive a message, close the connection, reconnect, get the next message, etc. That's because HTTP doesn't have persistent connections or server-client callbacks the way a binary protocol might have. The client needs to reconnect to the server with a new HTTP request every time it wants to check for messages.

If you are receiving a lot of messages this will result in very frequent HTTP requests [1]. This causes memory and threading problems on the server as it spins up new threads to handle the requests.

In my case this is made worse by the fact that the client will very frequently close an existing JMS consumer and create a new consumer with a different JMS selector. What happens in this case is that the new consumer will use a new outgoing port for its HTTP connection [2]. As the client rapidly creates new consumers and makes connections it will use up more and more ports. Windows is slow in cleaning up relinquished ports, so under heavy load when receiving a lot of messages and creating new consumers the client will eventually fail to connect when all ports are used up [3]. Making so many frequent HTTP connections on the client also causes memory and threading issues.

Luckily this issue is easily addressed by switching to a different JMS protocol. By using the JBossMQ UIL2 protocol only a single port and persistent connection is used for JMS. This allows the client to rapidly receive messages and create/close consumers without problems.

I thought this was an interesting problem since initially the implications of using JMS over HTTP weren't clear to me. The original idea of going over HTTP was to avoid opening an additional port on the server.

Notes:

[1] It is possible to set a property that will cause the client to wait before reconnecting to get the next message. However, this is not desirable if you want the client to process messages as quickly as possible.

[2] Using a new outgoing port may happen even if you're receiving messages using the same consumer, without closing/creating new consumers. I didn't test that case.

[3] On Windows the port limit can be increased to work around this part of the problem: KB196271

Impressions from BioIT World

I attended BioIT World last week as part of the GenoLogics crew that traveled out there. From a marketing and sales perspective it was an excellent show. Traffic at our booth was steady and it was very busy at times.

I thought the technology side of things was a little disappointing. I walked away with very little new concrete information. Most of the talks focused on what we "should" be doing, especially with regards to the semantic web and RDF technology. This is quite interesting, but there is nothing new here. I'm sure most of the audience has already heard this many times. I would have been much more interested to see some concrete examples of how this was actually implemented and put to use. I suppose the problem is that big pharma (who has the money + resources to actually do this) isn't interested in sharing their "secrets" since it is considered a competitive advantage.

Personally I'm still very skeptical around the semantic web and the feasibility of it in practice. While the technology certainly makes sense, the manual effort of unifying the many different systems and mapping them to an established common vocabulary seems almost insurmountable. This is made even more difficult by the fact that a large number of smaller to mid-side labs in academia do not have a proper data management system and are just working with Excel files stored in some sort of directory structure. Good luck indexing that and mapping the contents to an ontology.

The most interesting talks were around IT infrastructure for next generation sequencing. The talks from the Broad Institute and Harvard were great. Some take aways:

  • 1 next generation 454 sequencer generates as much data as 399 current ABI 3730s!
  • 1 gigabit networks are barely adequate for the data, 10 gigabit is the way to go
  • But in general moving that much data around the network is impractical, so they just swap disks and move them between computers. This was termed "SneakerNet". :-)
  • Even if the raw numbers add up, your infrastructure might fail due to secondary effects. For example the Broad Institute disk array was large enough for the data, but it failed since processing software kept hitting the same areas of the disk. This caused the disk to fail. They then had to switch to clustered storage.
  • This is as much a "social" problem as a technology problem. Researcher expectations have to be reset since we realistically can not keep all the data around forever and the data will not always be available in an instant.
Syndicate content