Skip to content

Ruby On Rails – Thinking Sphinx


Sphinx is open source search server used for full text search, we have used it with Rails and MySql database.


Why Sphinx:

  • Blazingly fast indexing and searching
  • Variety of Text Processing features which are easy to adapt as per Application.
  • Scaling: It can scale up to millions of queries per day.
  • Performance: Sphinx indexes up to 10-15 MB of text per second per single CPU core.
  • Easy to write Sql Style Queries
  • Grouping and Clustering
  • Distrubuted Searching

Thinking Sphinx:

Thinking Sphinx is a Ruby gem which helps us to connect Sphinx with Rails Active Record. We can attach our models to sphinx through Thinking Sphinx

Example: Suppose we have Model Country with fields:
id, name, about, continent_id, capital, rating_no

class Country < ActiveRecord::Base 
  has_many :places_to_visits 
end 
PlacesToVisit field: name, country_id 
class PlacesToVisit < ActiveRecord::Base 
  belongs_to :country 
end 

We want to have full text search on all fields of Country except id and its places to visit (has many association)

How to Use:
We will write in Country model

class Country < ActiveRecord::Base 
  has_many :places_to_visits 
  define_index do 
   indexes name, 
   :sortable => true 
   indexes about 
   indexes rating_no 
   indexes capital 
   indexes places_to_visits(:name), :as => :place, :sortable => true 
   has continent_id 
  end 
 end 

you can read more about indexing here

Searching example

Suppose we want to search countries with rating between 5 and 10.

  Country.search params[:search], :with => {:rating_no => 5..10} 

Here with is used for filtering, it accepts ruby array and ranges A typical example with some advanced options

Country(params[:search], :match_mode => :all, 
                         :field_weights => {:name => 20, :about => 10}, 
                         :with =>  {:rating_no => 5..10 }, 
                         :conditions => { :continent_id => 1}, 
                         :page => params[:page], :per_page => 3)

We have provided field weights here so that while searching is applied name will have more priority than about conditions is for full text searching specific attributes, :with is for filtering search results.

We use pagination with incorporating :page and :per_page parameters
match mode: How the key words will match to the fields
:all – all records that has the complete key word will be returned
:any – means all records that has any of the keyword will be returned
:boolean – we can also use boolean operators with keywords example:

Country.search ‘India | USA’, :match_mode => :boolean

Hope this post is useful to you

Thanks for Reading it.

How to serialize(marshalling) object in ruby

It happens many times that we want to store data on the persistance storage for using it some time later

Marshalling

Marshalling is the process by which we can serialize  and unserialize ruby object in our ruby programs.

The data that is stored can be read and the original object can be reconstituted

So, In marshalling we are actually recording the data(state + code) of the object in such a way that when it is unmarshalled it can be retained into original object with its data.

A more general explanation given here

Suppose we have data

users_data = {
               "users_status" => ['inactive', 'active', 'suspended', 'deleted'],   
               "default_settings" => [{:send_notification=> 'true', :subscribe_to_news_letter => '1'}]
            }

I’ll do Marshalling + encoding of data that can be used to store in database

So,i can do like

users_data.keys.each do |key|
  UserData[key.to_s] = users_data[key]
end

In UserData model, i can write something like:

#User(kind, value, created_at, updated_at)
class UserData

 def []=(kind, value)
   user_data = self.find_by_kind(kind.to_s) || self.new(:kind => kind.to_s)
   user_data.value = Base64.encode64(Marshal.dump(value))
   user_data.save 
 end

 def self.[](kind)
  user_data = UserData.find_by_kind(kind.to_s)
  user_data ? Marshal.load(Base64.decode64(user_data.value)) : nil
 end
end

A very good use case and a very simple code, please let me know if there any difficulty in understanding

XSS

Yesterday I was discussing about security in rails, the reach of web has been expanded so security in web apps can’t be overlooked.

I came to know some of the web issues

Cross-Site scripting

I have referred it from Rails security guide.

Now a days we have so much user generated content on our websites for example posting comments , searching information on search engines(which we called Web 2.0 style apps)

Definition

In cross site scripting an attacker tries to inject a malicious script into the website, he is able to bypass the security mechanisms applied by the browser on the client site, because the malicious script that is inserted in the input come from a trusted site(the browser treats the input as if it is the part of the target page.), so in this way the attacker is able to access the information like cookies or any other info of the true user can be accessed.

Prevention

– The main issue is if we do not validate the input inserted by the user, then when our website tries to display that non-validated input, the malicious script runs in browser as a part of the website, so the input must be filtered before displaying it.
If we do not allow user to enter any html data then prevention is easy. But it in some cases it is important.
In Rails h() methods escapes all special html characters.
eg, as you know it already

<% for comment in @article.comments %>
  <%=h comment %>
<% end %>

It is a good approach to store the content in the original form that is unescaped.

Of course things are not as simple, sometimes the filtering the input is even difficult .
Rails provide more help there through sanitize() method, I referred it to bible of Rails railsbrain
This method removed all javascript and form tags, this method is used just like h()
We can use it in customised fashion,

  <%= sanitize @usert.bio, :tags => %w(img), :attributes => %w(id class style)%>

How to exclude files in rcov(a-code-coverage-tool-in-ruby)

Today i was working on  one of my project for increasing test coverage , i want to exclude some files because they are unneccessary

i am using rspec framework for testing,  In spec folder there is file rcov.opts, in this file we have –exclude option. We can append our path of files to this. for example i want to exclude helpers , sweepers folders, so i just did –exclude “helpers/*,app/sweepers/*”.

Then when i ran rake spec:rcov, then the excluded files did not appear in the statistics appeared in index.html file

Parsing Json In Ruby

require 'net/http'
require 'rubygems'
require 'json'

@response = Net::HTTP.get(URI.parse("http://www.reddit.com/.json"))
@result = JSON.parse(@response)

Now the parsed info lies in the @result, by using methods of ruby, you can parse this @result according to you your need.