Groupwise to SalesForce Database Migration Experience

by Seth Miller

SalesForce contacts sales manager.
Manager buys SalesForce.
Manager recruits salespeople to migrate existing CRM data to SalesForce.
Manager makes promises he can’t keep.
Everyone comes begging DBA to fix problem on a deadline.

This type is situation is nothing new to a DBA, especially one in a smaller shop. Here’s the situation. Our current CRM is ContactWise which is a Novell product. The database is Oracle 9i with a thick client front end.

The data to be migrated from ContactWise into SalesForce was exported using Excel spreadsheet canned reports through the client interface. I heard that this was going on while they were doing it and warned my manager that if they were planning on migrating additional data at a later time, they had better include a unique identifier with their reports so there is something to cross-reference. That warning apparently fell on deaf ears because no such identifier was included.

They got all the contacts and accounts loaded into SalesForce and realized that they did not include any of the historical notes that go along with them. All they had were names and addresses which did them little good when dealing with existing clients.

Sure enough I got a meeting invite to “Discuss ContactWise Notes Migration” shortly after the management high-five’d each other on “Moving to the Cloud”. You can probably figure out the rest. I was asked how we can get all of the notes from ContactWise to their respective owners in SalesForce. My short answer was, “You can’t”. The best you can do is to match on as many unique names as possible and hand enter the rest. This was two weeks ago and I am still working on it.

There were a few issues that popped up that I wasn’t expecting which prompted me to write this post. The biggest issue being that the notes had RTF markup that needed to be eliminated before migrating them. Thanks to Tom Kyte and a feature of Oracle called “Oracle Text” I was able to do this in batch and with little scripting. More about that later.

The first thing to do was to get the contact names and unique id’s out of SalesForce (SF). SF has a neat little tool called Apex Data Loader (ADL). It is an installable client side program that spits out your data in CSV format and can also load it from CSV spreadsheets. There are not a lot of bells and whistles but it works.

The first snag was the fact that the password I tried to use to log ADL into SF did not work. After some research I came across a link in the SF Knowledge Base that told me I need to generate a key for my PC and add it to the end of my password when logging in.

The contact extract from SF consisted of Unique SF ID, Account ID that the contact belonged to, last name, first name and combined name (for ease of use). Then I needed to create a sqlldr parameter file to load the data into the ContactWise (CW) database.

Sqlldr Parameter File “A”

load data
  infile 'contact_extract.csv'
  into table sf_contacts
  fields terminated by "," optionally enclosed by '"'
  (id,accountid,lastname,firstname,name)

Sql Statement “A”

select id, na_fname firstname, na_lname lastname
,translate(hi_subject,',',' ') subject
,translate(no_note,',',' ') note, hi_date||'|' 
from cw_note
join cw_history on (no_key=hi_no_key)
join cw_name on (hi_na_key=na_key)
join sf_contacts on (lower(lastname) = lower(na_lname) and lower(firstname) = lower(na_fname))
where dbms_lob.getlength(no_note) < 1000
and no_note not like '%set the associated company for this contact to%'
and no_note not like '%changed the associated company for this contact from%'
and no_note not like '%" created "%'
and lower(name) not in
(
select lower(name) name from sf_contacts group by lower(name) having count(*) > 1
)
order by lastname desc, firstname, hi_date;

Now that I have the contacts that were transferred out of CW into SF back into CW, I can cross reference and see what matches. I eliminated any contacts without notes by creating an inner join on cw_note and cw_history which creates the one to many relationship from contact name to notes within CW. The only thing to cross-reference between CW and SF databases at this point is first name and last name so I join cw_name to sf_contacts on these two columns using the lower function in the off chance that case was changed in the process of migration or manipulation of the data after the migration.

The actual note data is contained in a clob column because it could either be text or an attached document. The folks that requested this data were told that no documents would be transferred; only text. The dbms_lob.getlength function eliminates any documents since a document converted to text will almost always exceed 1000 characters.

The first batch of this project was to find any contact cross-referenced between the two databases with attached notes that was unique on first name and last name. The “having” clause eliminates any contacts that are not unique on firs tname and last name.

This initial batch gave me about 90% of the notes that needed to be migrated along with first name, last name, subject and date. They were then exported into a pipe delimited file using the translate function to eliminate commas in the note and subject. The commas were eliminated because the ADL only loads CSV’s.

As mentioned before the note column was marked up with RTF which CW automatically translates and SF does not so I needed to find a way to eliminate it. Since there were 40,000 some odd records, eliminating it by hand was not an option.

A little research revealed a Tom Kyte post about Oracle Text and using a filter to eliminate RTF markup. It wasn’t quite what I was looking for, but it was enough to piece together what I needed. I didn’t want to do this in the CW database because I didn’t want to add tables and I needed a 10g database so a new schema was created in an existing dev database.

Sql Statement “B”

define MAKEUSER = 'TEST'
define USERPASS = 'TEST'
define USERTBS = 'TEST'

create user &MAKEUSER identified by "&USERPASS" default tablespace &USERTBS;
grant execute on "CTXSYS"."CTX_DDL" to &MAKEUSER;

Then I needed a couple of tables; one for the source data and one for the formatted data.

Sql Statement “C”

CREATE TABLE NOTES
(   "ID"            VARCHAR2(50),
    "FIRSTNAME"     VARCHAR2(255),
    "LASTNAME"      VARCHAR2(255),
    "SUBJECT"       VARCHAR2(255),
    "HI_DATE"       VARCHAR2(255),
    "NOTE"          VARCHAR2(1000),
    "FILTER_ID"     NUMBER,
    "ERROR_NUMBER"  NUMBER,
    "ERROR_MESSAGE" VARCHAR2(100));

create table filter ( query_id number, document clob );

The next step was to load the notes data in the test environment I just created. If you look closely at Sql Statement A, you can see I appended a pipe symbol to the last column. More on that in a second.

I ran Sql Statement A in Sql Developer to generate the data. It was then exported to a file using pipe delimitation. I transferred the pipe delimited file to the test server. When looking at it in vim, I saw the ubiquitous “^M” (carat M) at the end of each line so I ran the file through “dos2unix”.

Through a lot of trial and error I found out that there needs to be some tweeking done to the sqlldr parameter file to get the data to load properly. The notes column contains RTF markup as well as carriage returns. The standard sqlldr parameters interprets a carriage return as the end of the record which screws everything up.

Sqlldr Parameter File “B”

options (BINDSIZE=20971520, READSIZE=20971520)
load data
  infile 'notes_export.csv' "str '|;'"
  into table notes
  replace
  fields terminated by "|" optionally enclosed by '"' trailing nullcols
  (id,firstname,lastname,subject,note char(1000),hi_date)

Sqlldr lets you modify the end of record character by adding it to the end of the “infile” parameter. As you can see in Sqlldr Parameter File B, I modified it to “str ‘|;’”. “str” tells sqlldr to load the data in stream format so it doesn’t use the carriage return to indicate end of record. “‘|;’” tells sqlldr that the end of record will be indicated by the combination of a pipe symbol and a semi-colon. You can change this to whatever you want it to be. I picked the combination because I knew that it would not be in the data itself.

The pipe delimited file needed to be tweeked a little more before I could load the data. The end of record as it was spit out of Sql Developer looked like this:

"00360000013IyTRAA0"|"andrew"|"wemhoener"|"L introduction letter sent"|"{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil MS Sans Serif;}}
\viewkind4\uc1\pard\f0\fs20 A merge document was created from the\par
C:\\Documents and Settings\\All Users\\Desktop\\ContactWise\\Quote Template.doc\par
document on Monday  February 18  2008 6:16 PM.\par
\par
\par
\par
}
"|"18-FEB-08 05.16.20 PM|"

You can see the RTF markup and the end of the record is ended with a pipe and double quote. I need the end of the record to end with a double quote to enclose the last column, and a combination of a pipe and semi-colon to indicate end of record. This nifty global search and replace in Vim game me exactly what I needed:

%s/|"$/"|;/g

The same record now looks like this:

"00360000013IyTRAA0"|"andrew"|"wemhoener"|"L introduction letter sent"|"{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil MS Sans Serif;}}
\viewkind4\uc1\pard\f0\fs20 A merge document was created from the\par
C:\\Documents and Settings\\All Users\\Desktop\\ContactWise\\Quote Template.doc\par
document on Monday  February 18  2008 6:16 PM.\par
\par
\par
\par
}
"|"18-FEB-08 05.16.20 PM"|;

The other thing that needs to be addressed is the “options (BINDSIZE=20971520, READSIZE=20971520)” parameter which increases the array size for larger file loads. “char(1000)” is added to the note column because the default data type and size for a column is “char(255)” which wasn’t big enough for the 1000 byte column I was trying to load.

Now the data can be loaded into the notes table using Sqlldr Parameter File B. Double check the rows once they are loaded into the table to check for skewing in case there was an extra pipe symbol somewhere in your data.

Now we start getting into the Oracle Text stuff. The first thing to do is create a Text Index on the note column of the notes table so we can run a filter on it. This can take quite awhile depending on how many rows are in the notes table.

CREATE INDEX notes_idx ON notes(note) indextype is ctxsys.context parameters('DATASTORE CTXSYS.DEFAULT_DATASTORE FILTER CTXSYS.AUTO_FILTER');

Create a unique id on each row of the notes table so it can be cross-referenced with the filter table once the filter has been run.

UPDATE notes
   SET filter_id = (SELECT rn 
                  FROM (SELECT rownum AS rn, row_id 
                          FROM (SELECT rowid AS row_id 
                                  FROM notes b) 
                        )
                 WHERE row_id = notes.rowid);

Create a unique index on the filter_id row of the notes table so the CBO is not doing full scans just for a unique index join.

CREATE UNIQUE INDEX PK_NOTES ON NOTES (FILTER_ID);

Now we can run the filter on the note column while at the same time transferring over the unique id for cross-reference and properly handling errors if there are any.

declare
seq rowid;
notes_index varchar2(20) := 'NOTES_IDX';
filter_table varchar2(20) := 'FILTER';
err_num NUMBER;
err_msg VARCHAR2(100);

type note_type is table of notes%ROWTYPE
  index by binary_integer;
note_arr note_type;

begin

/* Create array, eliminating any records that have already been done
   so this loop can be run multiple times */
select * bulk collect into note_arr
from notes
where not exists
(select 1
from filter
where filter.query_id = notes.filter_id);

for i in note_arr.first..note_arr.last loop
begin

/* ctx_doc.filter uses rowid of the column to match to the index record ingested */
select rowid
into seq
from notes
where filter_id = note_arr(i).filter_id;

/* Run the filter using the Text index created */
ctx_doc.filter( notes_index, seq, filter_table, note_arr(i).filter_id, TRUE );

exception
when others then
err_num := SQLCODE;
err_msg := SUBSTR(SQLERRM, 1, 100);

/* Populate the offending row with the error information for later review */
update notes
set error_number = err_num
,error_message = err_msg
where filter_id = note_arr(i).filter_id;

end;
end loop;
end;
/

You should now have the before filter and after filter (sans RTF markup) versions of the note column on the notes table. Connect Sql Developer up to this test schema if you haven’t already to export this information to a CSV file for loading into SF.

Check error_number and error_message columns of the notes table to make sure there were no fails on the filter process. I have had a couple due to funky data but for the most part the filtering worked perfectly. Now you can select from the notes and filter table to pull the data back out to load into SF.

select id parentid
,substr(to_char(to_date(hi_date, 'DD-MON-YY HH.MI.SS AM'), 'YY/MM/DD')||' - '||subject, 1,80) title
,document body
from filter join notes on (query_id = filter_id)
where document is not null
and error_number is null
-- and query_id between 1 and 5000
order by lastname, firstname, to_date(hi_date, 'DD-MON-YY HH.MI.SS AM') desc;

You may want to uncomment the query_id predicate and do your load in batches if you have tens of thousands of rows. Export your data to a xls format spreadsheet. Why not straight to a CSV? Because of the carriage returns in the note column. Sql Developer does not properly double quote the columns and they are not properly delimited.

Open the xls file just created in Excel or OpenOffice and save as a standard CSV file. This file will be properly formatted to load into SF.