Nebulent - software solutions

Nebulent

Edie (Enterprise Data Intelligence Engine) is our latest application that is specifically designed to help organization get up to speed with Amazon Mechnical Turk platform without going through a learning curve. No need to code or manage any spreadsheets. Design your tasks and push your data through our qualified work force to get fast results at fraction of a penny.

Contacts

Contact information

Toll Free: +1(888)201-7922

Blog

View Netsuite Web Service Usage Log

Netsuite has a very helpfull feature that allows you to see all webservice request/responces. This saves a lot of time while debugging integration with Netsuite.

To see the list of calls go to Setup -> Integration -> Web Services Usage Log (https://system.netsuite.com/app/webservices/syncstatus.nl)

2012-12-04
Making ruby to fly

In this nice blog post by Justin Kulesza, the author points out that ruby interpreter is compiled by RVM wthout any optimizations at all, and he suggests to add -O3 to CFLAGS. Without a doubt, this adds some performance boost to your ruby interpreter.

But, there are other tricks which you can use to improve the ruby performance.

Note: you need to recompile ruby if you add one of these “tweaks”

Processor-specific CFLAGS

Gentoo Wiki has a section about safe performance flags for your processor. I, for example have a Core Duo CPU, and the Intel section of cflags article says I need to use the following cflags:

CFLAGS="-march=prescott -O2 -pipe -fomit-frame-pointer"

Obviusly, I need to put these values where RVM will recognize them, so .rvmrcin my home directory should look like this:

rvm_configure_env=(CFLAGS="-march=core2 -O2 -pipe -fomit-frame-pointer")

Note: THESE COMPILER FLAGS ARE SAFE FOR MY PROCESSOR ONLY, YOU SHOULD FIND YOURS ON THE WIKI

Falcon patch

There is a gist on github, which addresses some of the performance issues ruby 1.9.3 has, and people are reporting up to 2X boost in some scenarios.

Latest versions of RVM knows about this patch, and all you have to do is to specify a flag to rvm when installing ruby:

rvm install 1.9.3-turbo --patch falcon

Result

Unfortunately, I do not have time to do some detailed tests, but here is a small benchmark result (which is not a very good indicator of performance difference).

The test:

time bundle exec rake routes in a big Rails project

(which is in fact the only scenario I wished would be faster in my daily dev life)

ruby 1.9.3 - 30.57s

ruby 1.9.3 -O2 - 23.68s

ruby 1.9.3 -O2 +custom cflags - 23.03s

ruby 1.9.3 -O2 +custom cflags +falcon patch - 6.99s

 

2012-11-22
Installing Activiti Eclipse BPMN 2.0 Designer under Spring Tool Suite 3.0.0

Unfortunately Activiti Eclipse BPMN 2.0 Designer does not install nice under Spring Tool Suite 3.0.0. To make it work you need to install EMF Validation Framework and EMF Model Transaction first.

2012-10-02
Quick start with Camel 2.9.2 + Spring using Maven archetype

mvn archetype:generate -DarchetypeArtifactId=camel-archetype-spring -DarchetypeVersion=2.9.2 -DgroupId=com.nebulent -DartifactId=nebulent-jobs -DarchetypeRepository=https://repository.apache.org/content/groups/public -DarchetypeGroupId=org.apache.camel.archetypes

2012-05-01
How to attach and mount EBS volume into EC2 Instance Ubuntu 10.10

Taken from  http://yoodey.com/how-attach-and-mount-ebs-volume-ec2-instance-ubuntu-1010

Original seems to be down and I found it very useful in daily routing (I am not a linux geek :) )

===================

Updated!
Before you do this, backup all /var into /var-backup by sudo rsync -avr /var/* /var-backup/
After mounting EBS, you can rsync restore /var by sudo rsync -avr /var-backup/* /var/

Using EBS volume can decrease risk of server crash suddenly or “permission denied” problem in EC2 server. The logic, after we create instance store, we create new EBS volume which we can set the capacity as we need. In this cases, i create 60GB EBS and will using it as /var in instance store. Now, let configure our EC2 and using EBS as indepent storage.

1. Create EBS Volume and attach it into instance store in EC2 panel management. Use same region, ex : east-1b
2. Login into instance store SSH.
3. In my configuration, EBS Volume located in /dev/sdg. So remember what yours.

4. Use sudo fdisk -l to see if your EBS already attached

Disk /dev/sdg: 64.4 GB, 64424509440 bytes
255 heads, 63 sectors/track, 7832 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0×00000000

5. Formating EBS Volume so we can use it by sudo mkfs -t ext4 /dev/sdg and you will get result :

mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
3932160 inodes, 15728640 blocks
786432 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
480 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424

This will takes a long time, like 10-30 minutes.

6. Edit /etc/fstab and add this :

/dev/sdg        /var    auto    defaults,nobootwait,noatime     0       0

and reboot from EC2 panel management. Now, you use EBS for all installed program and www.

 

2012-03-05
Changing private keys on Amazon (AWS) EC2 instances (authorized_keys)

It is unfortunate, but as CEOs of small start-up companies, sometimes we have to fire key players of our organizations. Nowadays, for many IT organizations entire infrastructure runs in the cloud (Amazon AWS, for example). As a result of unfortunate events we need to take necessary security measures and forbid access to our infrastructure. Even though it is easy to change password to AWS console, changing SSH keys is not that trivial. Below please find several steps that need to be performed to change keys on your running AWS instances.

  1. Login into AWS management console. Go the EC2 tab, then select “Key Pairs”->”Create Key Pair” to generate a new private key, which will start downloading immediately. This is your private key that you cannot loose, so save it in a safe place.
  2. Upload new key file to one of the instances where you want to change the keys. DO NOT FORGET TO DELETE THIS FILE ONCE ALL THESE STEPS ARE COMPLETED.
  3. Just in case, change permissions on the file by running “chmod 0600 FILENAME.pem”.
  4. Run ‘ssh-keygen -y’ command next, where you will then be prompted for the private key file path. Upon completion, you should get an output on your console of the public key, which you will need to save.
  5. Change directory to /home/USER/.ssh and open authorized_keys file for editing. Now, here you can either add to or replace the old value with newly generated public key value. If added to, you will be able to use both keys. I would recommend doing that to make sure new key works and then remove the old key once validated.
  6. Remember to start all your new Amazon EC2 instances with new key pair.
2011-12-12
MongoDB findModify with Spring Data

 

public static class FindAndModifyRequest implements Serializable { /**/ private static final long serialVersionUID = 1L; private Query query;  private Field field;  private Sort sort; private Update update; private boolean remove;  private boolean returnNew;  private boolean upsert;

 

/** * @param <T> * @param request * @param converter * @param collectionName * @return */ protected <T> T findAndModify(Class<T> clazz, FindAndModifyRequest request, String collectionName) { DBObject updateObj = request.getUpdateObject(); for (String key : updateObj.keySet()) { updateObj.put(key, mongoTemplate.getConverter().convertToMongoType(updateObj.get(key))); } DBObject queryObj = (request.getQuery() == null) ? new BasicDBObject() : queryMapper.getMappedObject(request.getQueryObject(), mongoTemplate.getConverter().getMappingContext().getPersistentEntity(clazz)); DBObject dbObject = getDbCollection(collectionName).findAndModify(/*request.getQueryObject()*/queryObj,       request.getFieldsObject(),       request.getSortObject(),       request.isRemove(),       /*request.getUpdateObject()*/updateObj,       request.isReturnNew(),       request.isUpsert()); if(dbObject == null) { return null; } return mongoTemplate.getConverter().read(clazz, dbObject); }

2011-11-15
Spring Data with Mongo hint command

Quiet often, MongoDB fails to use existing indexes based on the supplied queries (found out the hard way), but it is possible to force MongoDB using a certain index with “hint” command. To do this in Spring Data, follow example below:

{code}
public List<Performance> searchPerformancesByTitle(String accountUUID, String title) {
Query orQuery1 = Query.query(Criteria.where(“title”).is(new BasicDBObject(“$regex”, title).append(“$options”, “i”)));
Query orQuery2 = Query.query(Criteria.where(“fileName”).is(new BasicDBObject(“$regex”, title + “$”)));
Query query = new Query().or(new Query[]{orQuery1, orQuery2});
query.fields().include(NATIVE_RECORDING_UUID).include(“title”).include(“duration”).include(“genre”).include(“artist”).include(“fileName”); //.include(“album”).include(“year”).include(“comment”);
query.limit(getMaxTitleSearchMatches());
return getMongoTemplate().find(query, Performance.class, new CursorPreparer() {

@Override
public DBCursor prepare(DBCursor cursor) {
cursor.hint(“tile_fileName_index”);
return cursor;
}
}, accountUUID);
}

{code}

2011-08-20
MySQL to MongoDB in 2 days

Clio

Clio is Software as a Service platform, designed to process music and provide ability to algorithmically match any song against a large set of other songs in ones library. Clio is the only software in the world that can analyze and decode the universal patterns that define musical identity and mood.

Far more advanced than cataloging simple metrics like beats-per-minute and key, Clio understands the flow of musical ideas, recognizes subtle differences between drum grooves, and identifies the unique performance styles of individual musicians. It finds and prioritizes the parts of the music that we, as listeners, find most important.

Unlike other machine learning or social recommendation-based solutions, Clio’s technology intuits the difference between Lady Gaga and Ravi Shankar and can find music that sounds (and feels!) like either one.

Original Architecture

Originally, the data architecture for Clio platform was designed with Relational mind set. At the time of the initial design the software was preserving some metadata and a single record of analysis data per song. If we take into account the fact that there are approximately 46,002,354 songs (note the iTunes Music Store has only 2.5% of these songs available) according to Gracenote, most of the relational databases would be able to efficiently store and process such amount of data. As the system was growing and algorithms maturing we ended up with thousands of records per each song, which now included entire set of DSP data (notes, beats and etc.).

With a release deadline barely ahead of us, we hit a wall of data, counting billions of records for any of our medium-size clients. Without wasting much time, a decision has been made to look into non-SQL databases. We’ve addressed similar problems before by implementing a Lucene/Solr farm layer on top of relational data, but the lack of support for hierarchical, document-based structures pointed us directly to MongoDB. It was our luck that the nature of the application defined all audio data as a large self-contained set (document) of data per each song (as illustrated below).

Relational Model

Magic of Spring Data

So, we found ourselves left with original JPA-based domain model, DAO layer and lots of service code around it. Luckily, Spring Source comes to rescue with their Spring Data project and support for MongoDB, among other non-SQL databases.

From the model above, performance table naturally becomes our MongoDB BSON document and all we need to do is simply add Spring Data @Document annotation at the top of Performance JPA bean as illustrated below.

JPA Bean for Performance class

Note, to make MongoDB function correctly, every object in the document, according to Spring Data must have id class property defined, which will serve as a primary key. Even though in theory one can use @Id annotation even if you have a primary key attribute with name other than id, we found that approach is currently not functional. As you see, even in case of primary keys we got lucky since they all were named id and each table was normalized to have no composite primary keys, which will also not work as expected in JPA.

Now, all you need to start interacting with MongoDB is simply define mongo template as illustrated below.

Spring Data mongoTemplate

Interacting with MongoDB in Spring Data

Collection Size

One of the things that were not apparent in the beginning was the way to get a count of documents in a collection without actually retrieving any data, similar to MongoDB command below:

db['my.collection-1'].find().count()

To achieve this in Spring Data, simply do this:

mongoTemplate.getCollection("my.collection-1").count()

Like Queries

If you need to run case insensitive like search, you should not use regex(String) method of Criteria class, use BasicDBObject instead as shown below:

Query query = Query.query(Criteria.where("title").is(new BasicDBObject("$regex", title).append("$options", "i")));

Use Indexes

Indexes do help! After defining composite indexes that matched parameters of our heaviest queries we’ve drastically improved performance. But keep in mind that there is a limit on the number of attributes that you can define in a composite index.

Deployment and Setup

Amazon EC2

It’s a fact that MongoDB requires lots RAM, but during numerous tests we found that it also consumes lots of CPU on Amazon EC2 instances. Based on our tests, the most stable EC2 instance type to use when you have above 100GB worth of data (after export) would be High CPU Extra Large.

High Availability and Scalability

In order to address scalability and availability concerns, proposed solution would implement a replicating cluster of MongoDB databases distributed across multiple availability zones. This would substantially improve up-time SLA for the database services. In addition, implementation will benefit from horizontally scalable architecture, having 2-3 databases processing the load instead of just one, sitting behind a regional load balancer. This benefit will come at a cost, however, since more servers would have to be operated and each of them would have its own copy of the database. Network charges will not apply since all network traffic within the Amazon cloud is free.

Pitfalls on the way

  • In case of composite keys, MongoDB will complain during serializing of the field in current version of Spring Data. Also, pay attention to the fact that you can only use “Long”, “String”, “BigInteger” or “ObjectId” as MongoDB data type for the primary key in a document
  • MongoDB cannot serialize “Character” data type, so do not use “char” or “Character” in your java beans. As a result, either change to String or use enum(s) which serializes just fine (see below). This is another issue with Spring Data.

Enumeration

  • No support for transactions.
  • Spring Data currently does not permit setting values for multiple fields in a single call, so in case you want to update multiple fields of MongoDB document you have two options:
    1. Save entire document (if documents exists, MongoDB will override the same document).
    2. Perform multiple updates (for each field you want to update) using “mongoTemplate”.
2011-05-14