Recently I got some spare time on my open-source project wiki-infobox-parser, which is also available as a npm package. Feel free to test it, tell me the issue and all PRs are welcomed.

As a junior software developer it takes me about 5 hours to refactor the whole project, including modularizing functionalities, writing test cases and adding more documentations. The first version of the Wikipedia Infobox parser was written last year, at that time I was still a JavaScript beginner, I wrote this parser to learn how to program with asynchronous I/O. Coincidentally, when I was still in university my first big project (that means I have to draft requirement docs, write tests, develop an application, test and deploy) was also to write a wiki parser with its API.

As described in Wikipedia:

Wiki markup, also known as wikitext or wikicode, consists of the syntax and keywords used by the MediaWiki software to format a page. To learn how to see this markup, and to save an edit, see: Help:Editing. There is a short list of markup and tips at Help:Cheatsheet.

The cheatsheet is well explained and structured, it has 14 mains sections including Layout, Format, Links and URLs, etc. This document is quite long but you definitely need to read it carefully. There are many exceptions and the structure can be nested. Since I was not intended to write a fully functional parser for wikitext, I did not use traditional techniques like lexical analysis, parsing, syntax tree, etc. What I need is a light parser that can extract the text shown in wiki pages.

After the refactoring, I started to think about something that I had never considered.

  • Test Driven Development. Although we have learnt this term from school and we all know it is an important software development methodology. No matter you are going to add new features or modify existing implementations, you can not be 100% sure that what you have done is free of bugs or will not cause any regression issue. Pay attention, especially when other projects rely on your tool, you can not afford the expense to not only fail your own project but also crash others’ applications. Thanks to EvanBoyle for his feedback, I do appreciate his feedbacks.

  • Development and Release. When I was in the start-up Fenyin or Suimi, PMs always urge us to deploy the project no matter how much the product is accomplished. Sometimes it is lucky for a software developer if he/she can work with a PM, who has a little technique background. Do not publish the code that has no been fully tested, as has been mentioned above, you should be responsible for your quality. Do not publish the code each time when a new feature is added, sometimes what people want is a stable code with existing requirements. Of course you can give tags and let people to choose which version to use, but most time they just use the latest version "wiki-infobox-parser": "*". Please make sure you have fully test your code and the new feature will not cause any regression problem.

  • Pay attention to git commits. The first version of wiki-infobox-parser was developed by a JavaScript newbie, so I mistakenly committed node_modules with all dependencies :(. Though I found it, git rm them and added them to .gitignore, its git log was growing to 8MB, seriously, all scripts only account for 12.2 KB! I had not notice that util yesterday when I was trying to install it from my other projects, it took me ~30 seconds to download the dependency. Awful job. Pay attention to your commits, try git status and git diff every time before you are going to commit. Of course there are many ways to make up for the mistake after it is been pushed, but without doubt you can always get benefits from a good habit.

As a junior software developer, I still have a lot of stuff to learn. Thanks to the open-source community.