Archive for the Technology Category

Technology Platform

In this post I will give an overview of the different components that  make up the SellFire technology platform and the technologies that I am using. For my non-technical readers, this post will likely be indigestible for you :)

Data Feed Processing Engine

The back end of SellFire is a product data feed processing and normalization engine. SellFire will process data feed files from a multitude of affiliate networks. Each of these affiliate networks publish data feeds in a different format. In addition, within an affiliate network each individual merchant may have varying quirks in their feed. For example, they could be misusing columns or be encoding their content using varying techniques. In order to maintain a database of affiliate offers for SellFire customers to search against, all of these data feeds must be transformed into a single, normalized format and imported into a search engine. The SellFire data feed processing engine is responsible for this normalization and import.

The processing engine performs the following tasks:

 

  1. Recognizing when a new data feed has been published by a  merchant, either by monitoring a hot folder for push activity or by regularly polling a source location
  2. Normalizing the data feed into a format used by SellFire.
  3. Cleaning the data of any irregularities – e.g. encoding issues, unusable entries, duplicate entries,
  4. Quality checking the data feed for accuracy. This is done automatically by ensuring that each product data feed entry A) has all required field and B) points to a page that has content that appears to be correct.
  5. Comparing the data feed against the merchant’s previous version and alerting customers who have active advertisements that have been changed.
  6. Export the normalized feed into two data repositories – the advertisement repository, which stores the full advertisement, and the search engine, which only stores the searchable components.

 

The data feed processing engine is written in C# and runs as a Windows Service. I choose this technology platform mostly out of familiarity. The .NET API makes performing many tasks very simple. For me, the only downside to the technology choice is cost. It will cost me twice as much to run the service on a Windows machine than if it was written in Java and it was run on a Linux box. On the other hand, the service can normalize and import nearly 10,000 products a second, so there is not much need for more than one machine.

Data Storage

This is subject to change, but right now I am using three different technologies to store data.

MySql

I am using a MySql database to store a wide range of information about my customers, affiliate networks, merchants, and settings. For instance a MySql database contains all of the information about how to normalize a merchant’s data feed into the SellFire format and also what data sanitation operations need to be applied.

I choose MySql for this task because it is free, full featured, and integrates with .NET pretty seamlessly. I would have preferred to use Microsoft’s Sql Server, mostly because the GUI is much better, but it is absurdly expensive. It would cost almost 5x as much to run a SQL Server Machine in the cloud than a MySql machine.

MongoDB

I am using MongoDB as the advertisement repository. The advertisement repository will hold all of the advertisements from all imported data feeds. I expect that SellFire will quickly grow to over 50 million advertisements. I choose not to use a relational database for this task for several reasons. First, I was curious about using NOSQL databases and wanted to get some experience with them. Secondly, some tests that I performed showed that importing and reading advertisements from a MongoDB instance would be significantly faster than doing it on a MySql database. Lastly, the types of queries I will be doing on the advertisement repository are very simple. I will be looking up advertisements exclusively by their primary key. As it turns out, MongoDB also supports auto-sharding, which will make it easier to scale the repository over multiple machines as well.

Integrating MongoDB into .NET was also astoundingly easy. The official C# driver worked right out of the box and since Mongo stores its records as JSON, serializing complete objects into the database is easy as can be.

Solr Search Server

The search component of SellFire will be powered by Solr. Solr is an open source search server that runs the Lucene search engine. I choose Solr because I have the requirement to support advanced, multi-term search. Solr supports compound searches, facet searching, replication, sharding, and much, much more. Again, integration with Solr has been surprisingly simplistic. Solr is also blazingly fast.

The Front End

The front end of the SellFire site consists, at a high level, of two components. The server side infrastructure and the client side infrastructure.

Server Side Infrastructure

The SellFire website is an ASP.NET MVC website written using the Razor view engine. I choose ASP.NET MVC for its simplicity and its strong separation of tiers. The Razor view engine is also a big step forward – writing complex, dynamic views is a lot easier with the abbreviated syntax that it offers.

The models and controllers of the system communicate with a business logic layer written in C#. The system communicates with the MySql database using Microsoft’s ORM technology – Entity Framework. I choose Entity Framework over nHibernate mostly because of the built in Visual Studio integration. However, on the downside, using it currently precludes me from moving over to running on Mono. However, Entity Framework has been working out pretty well for me and really does alleviate the need to write a lot of boiler plate code.

Client Side Infrastructure

At the heart of SellFire’s technology platform is a WYSIWYG Affiliate Store Builder. The store builder allows affiliates to search for products and create customizes product showcases in a point and click fashion. Creating the store builder requires a lot of client side code to be written. Of course, I am using Javascript, but I am using two interesting technologies to make the development go a lot faster.

CoffeScript

CoffeeScript is a great language that compiles into Javascript. It removes all of the semicolons and brackets and replaces them with a white space sensitive syntax. It also adds in some niceties, including higher level looping (for each) and a class inheritance model that is familiar to the world. Writing in CoffeScript feels a lot faster than writing in native Javascript. Also, using the Web Workbench VS plugin, you get built in compilation and syntax highlighting. On the downside, debugging a compilation issue due to incorrect white space can be pretty frustrating sometimes and the compiler seems to have a few quirks. Also, you lose VS 2010′s slick JS intellisense features.

Knockout

Knockout is an MVVM framework for Javascript. It allows you to build client side applications similar to the way you would build a WPF application – with a crisp separation of behavior and presentation. Knockout is a great framework, but I found it to have a difficult learning curve. Not surprisingly, debugging data binding issues in Knockout is as tricky as it is in WPF and can be pretty frustrating. That being said, two weeks into client side development I am happy with the technology decision.

 

What do you think? What technologies would you have used or investigated?