Here's the problem:
The products we sell have a product name, each product has one or more
grades of the product (which usually relate to the quality of it) for each
grade we have a package size and a unique number called a quot;SKUquot; that
identifies that grade and package size in our system.
I need to take the source code of a page with all of this information and
turn it into an excel sheet with columns titled quot;productquot; quot;gradequot; quot;packagingquot;
quot;skuquot; for each of the grades. The format of the html source code throws in
all sorts of characters that I need to get rid of. It also lumps it into one
block text with no line breaks. The below text file shows the format of two
products, the first product has 2 grades and the second product has 3 grades.
How can I use macros, formulas, anything to get the block of text into the
format I want. It would take me years if I go in by hand and remove the
unwanted characters and manually copy and paste them into cells. Please help
I would appreciate it so so so much. Is this even possible (or feasible).
There is no way I can get any other form of the data.
Here is an example of what I have:
www.savefile.com/files/7119995
Example of what I need:
www.savefile.com/files/9420878
Happy Holidays Everyone, and a huge thank you for all of the people who help
people on this site!
ASAP Utilities has a feature that will remove any selected character(s) from
text......it's free from www.asap-utilities.com
Vaya con Dios,
Chuck, CABGx3quot;Dornquot; gt; wrote in message
...
gt; Here's the problem:
gt;
gt; The products we sell have a product name, each product has one or more
gt; grades of the product (which usually relate to the quality of it) for each
gt; grade we have a package size and a unique number called a quot;SKUquot; that
gt; identifies that grade and package size in our system.
gt;
gt; I need to take the source code of a page with all of this information and
gt; turn it into an excel sheet with columns titled quot;productquot; quot;gradequot;
quot;packagingquot;
gt; quot;skuquot; for each of the grades. The format of the html source code throws
in
gt; all sorts of characters that I need to get rid of. It also lumps it into
one
gt; block text with no line breaks. The below text file shows the format of
two
gt; products, the first product has 2 grades and the second product has 3
grades.
gt; How can I use macros, formulas, anything to get the block of text into
the
gt; format I want. It would take me years if I go in by hand and remove the
gt; unwanted characters and manually copy and paste them into cells. Please
help
gt; I would appreciate it so so so much. Is this even possible (or feasible).
gt; There is no way I can get any other form of the data.
gt; Here is an example of what I have:
gt;
gt; www.savefile.com/files/7119995
gt;
gt; Example of what I need:
gt;
gt; www.savefile.com/files/9420878
gt;
gt; Happy Holidays Everyone, and a huge thank you for all of the people who
help
gt; people on this site!
Well, you might be better taking up Chuck's offer ...
I've looked at your text file in Notepad. With word-wrap on and a few
hard-returns, you can see the structure in there quite easily and some
codes are fairly obvious - amp;quot; for quot;, amp;lt; for Left Tab etc. You
could do successive Find amp; Replace within Notepad to tidy up the file
quite a bit. However, there are other codes - presumably c2, c3 etc
relate to column 2, column 3 - so it's really down to how well you can
recognise patterns as to how long it will take you.
I think the general strategy would be to use Notepad (or Wordpad) to
convert this into a format which could ultimately be read by Excel. You
may still have a lot of tidying up to do once the data has been put
into Excel.
Pete
Where does the quot;originalquot; information come from? If possible can you
include a sample of it, i.e. HTML or whatever.--
wjohnson
------------------------------------------------------------------------
wjohnson's Profile: www.excelforum.com/member.php...oamp;userid=29640
View this thread: www.excelforum.com/showthread...hreadid=495673Unfortuantely I don't know if I am able to release all of our packaging sizes
and grades, however the attached text file shows the unique characters for
each type of data. If I could just find a way to split up the text into a
new row everytime it sees the word quot;productmasterquot; that would cut down on the
time it takes me to do this considerably. The text file below shows the
exact format of the information without line breaks. The full version just
has over 500 products with over 2000 grades. Just incase the text file won't
download here is an example for two products (with a total of 5 product
grades):
ProductMastergt;lt;ProductMaster Name=quot;Actafoam(R)quot;gt;lt;Product Grade=quot;F2,
Powderquot;gt;lt;Grade SKU=quot;0115791quot;gt;lt;c2gt;66.138 lb drumlt;/c2gt;lt;c3gt;0115791lt;/c3gt;lt;c4
null=quot;1quot;gt;lt;/c4gt;lt;c5 null=quot;1quot;gt;lt;/c5gt;lt;/Gradegt;lt;/Productgt;lt;Product Grade=quot;R-3quot;gt;lt;Grade
SKU=quot;0114873quot;gt;lt;c2gt;449.7384 lb drumlt;/c2gt;lt;c3gt;0114873lt;/c3gt;lt;c4 null=quot;1quot;gt;lt;/c4gt;lt;c5
null=quot;1quot;gt;lt;/c5gt;lt;/Gradegt;lt;/Productgt;lt;/ProductMastergt;lt;ProductMaster
Name=quot;Acudynequot;gt;lt;Product Grade=quot;180, 48 solidquot;gt;lt;Grade SKU=quot;0113541quot;gt;lt;c2gt;473.99
lb drumlt;/c2gt;lt;c3gt;0113541lt;/c3gt;lt;c4 null=quot;1quot;gt;lt;/c4gt;lt;c5
null=quot;1quot;gt;lt;/c5gt;lt;/Gradegt;lt;/Productgt;lt;Product Grade=quot;DHR, 48 solidquot;gt;lt;Grade
SKU=quot;0113550quot;gt;lt;c2gt;473.99 lb drumlt;/c2gt;lt;c3gt;0113550lt;/c3gt;lt;c4 null=quot;1quot;gt;lt;/c4gt;lt;c5
null=quot;1quot;gt;lt;/c5gt;lt;/Gradegt;lt;/Productgt;lt;Product Grade=quot;SCP 25% Solidquot;gt;lt;Grade
SKU=quot;0113548quot;gt;lt;c2gt;473.99 lb drumlt;/c2gt;lt;c3gt;0113548lt;/c3gt;lt;c4 null=quot;1quot;gt;lt;/c4gt;lt;c5
null=quot;1quot;gt;lt;/c5gt;lt;/Gradegt;lt;/Productgt;lt;/
As you can see the letters quot;productmasterquot; is an identifier of a new
product, quot;product gradequot; is an identifier of a new grade and so on. If I
could at least break it up a little bit into cells I could go through with
the asap utility and clean it up. I don't expect this to be quick, just
quicker.
Thanks!
quot;wjohnsonquot; wrote:
gt;
gt; Where does the quot;originalquot; information come from? If possible can you
gt; include a sample of it, i.e. HTML or whatever.
gt;
gt;
gt; --
gt; wjohnson
gt; ------------------------------------------------------------------------
gt; wjohnson's Profile: www.excelforum.com/member.php...oamp;userid=29640
gt; View this thread: www.excelforum.com/showthread...hreadid=495673
gt;
gt;
Dorn wrote:
gt; Unfortuantely I don't know if I am able to release all of our packaging sizes
gt; and grades, however the attached text file shows the unique characters for
gt; each type of data. If I could just find a way to split up the text into a
gt; new row everytime it sees the word quot;productmasterquot; that would cut down on the
gt; time it takes me to do this considerably. The text file below shows the
gt; exact format of the information without line breaks. The full version just
gt; has over 500 products with over 2000 grades. Just incase the text file won't
gt; download here is an example for two products (with a total of 5 product
gt; grades):
gt;
gt; ProductMastergt;lt;ProductMaster Name=quot;Actafoam(R)quot;gt;lt;Product Grade=quot;F2,
gt; Powderquot;gt;lt;Grade SKU=quot;0115791quot;gt;lt;c2gt;66.138 lb drumlt;/c2gt;lt;c3gt;0115791lt;/c3gt;lt;c4
gt; null=quot;1quot;gt;lt;/c4gt;lt;c5 null=quot;1quot;gt;lt;/c5gt;lt;/Gradegt;lt;/Productgt;lt;Product Grade=quot;R-3quot;gt;lt;Grade
gt; SKU=quot;0114873quot;gt;lt;c2gt;449.7384 lb drumlt;/c2gt;lt;c3gt;0114873lt;/c3gt;lt;c4 null=quot;1quot;gt;lt;/c4gt;lt;c5
gt; null=quot;1quot;gt;lt;/c5gt;lt;/Gradegt;lt;/Productgt;lt;/ProductMastergt;lt;ProductMaster
gt; Name=quot;Acudynequot;gt;lt;Product Grade=quot;180, 48 solidquot;gt;lt;Grade SKU=quot;0113541quot;gt;lt;c2gt;473.99
gt; lb drumlt;/c2gt;lt;c3gt;0113541lt;/c3gt;lt;c4 null=quot;1quot;gt;lt;/c4gt;lt;c5
gt; null=quot;1quot;gt;lt;/c5gt;lt;/Gradegt;lt;/Productgt;lt;Product Grade=quot;DHR, 48 solidquot;gt;lt;Grade
gt; SKU=quot;0113550quot;gt;lt;c2gt;473.99 lb drumlt;/c2gt;lt;c3gt;0113550lt;/c3gt;lt;c4 null=quot;1quot;gt;lt;/c4gt;lt;c5
gt; null=quot;1quot;gt;lt;/c5gt;lt;/Gradegt;lt;/Productgt;lt;Product Grade=quot;SCP 25% Solidquot;gt;lt;Grade
gt; SKU=quot;0113548quot;gt;lt;c2gt;473.99 lb drumlt;/c2gt;lt;c3gt;0113548lt;/c3gt;lt;c4 null=quot;1quot;gt;lt;/c4gt;lt;c5
gt; null=quot;1quot;gt;lt;/c5gt;lt;/Gradegt;lt;/Productgt;lt;/
gt;
gt; As you can see the letters quot;productmasterquot; is an identifier of a new
gt; product, quot;product gradequot; is an identifier of a new grade and so on. If I
gt; could at least break it up a little bit into cells I could go through with
gt; the asap utility and clean it up. I don't expect this to be quick, just
gt; quicker.
gt;
gt; Thanks!
gt;
gt; quot;wjohnsonquot; wrote:
gt;
gt;gt; Where does the quot;originalquot; information come from? If possible can you
gt;gt; include a sample of it, i.e. HTML or whatever.
gt;gt;
gt;gt;
gt;gt; --
gt;gt; wjohnson
gt;gt; ------------------------------------------------------------------------
gt;gt; wjohnson's Profile: www.excelforum.com/member.php...oamp;userid=29640
gt;gt; View this thread: www.excelforum.com/showthread...hreadid=495673
gt;gt;
gt;gt;
Dorn
1. What software generated the file? Do you have control of that
software such that you could see if it has other export formats that
might result in 1 row per item?
2. You may want to investigate opening the file in WORD and using Find |
Replace to replace quot;lt;productmasterquot; with quot;Line Break lt;productmasterquot;
Texas Handly
This file is slightly different (easier) than that pointed to in the
link in your original post - in that you had combinations like
amp;quot;|amp;quot and ;amp;gt;amp;lt; which made it more difficult to see the
pattern. Following on from Brian's suggestion, you want to
Find lt;ProductMaster Name=
Replace with Line Break
as you don't need the text. Similarly,
Find gt;lt;Product Grade=
Replace with , (comma), and
Find gt;lt;Grade SKU=
Replace with ,
This way you can build up a csv file quite quickly and then bring it
into Excel.
PeteThanks for all of your help and suggestions. I ended up figuring out how to
do it with a non excel solution. I used edit replace in notepad to change
all of the jibberish into html table tags then I saved it as a .html opened
it with internet explorer and copied and pasted the table into excel
quot;Petequot; wrote:
gt; This file is slightly different (easier) than that pointed to in the
gt; link in your original post - in that you had combinations like
gt; quot;|amp;quot and ;gt;lt; which made it more difficult to see the
gt; pattern. Following on from Brian's suggestion, you want to
gt; Find lt;ProductMaster Name=
gt; Replace with Line Break
gt; as you don't need the text. Similarly,
gt; Find gt;lt;Product Grade=
gt; Replace with , (comma), and
gt; Find gt;lt;Grade SKU=
gt; Replace with ,
gt;
gt; This way you can build up a csv file quite quickly and then bring it
gt; into Excel.
gt;
gt; Pete
gt;
gt;
Hello,
Try my web table import utility at:
www.geocities.com/excelmarkswayquot;Dornquot; wrote:
gt; Here's the problem:
gt;
gt; The products we sell have a product name, each product has one or more
gt; grades of the product (which usually relate to the quality of it) for each
gt; grade we have a package size and a unique number called a quot;SKUquot; that
gt; identifies that grade and package size in our system.
gt;
gt; I need to take the source code of a page with all of this information and
gt; turn it into an excel sheet with columns titled quot;productquot; quot;gradequot; quot;packagingquot;
gt; quot;skuquot; for each of the grades. The format of the html source code throws in
gt; all sorts of characters that I need to get rid of. It also lumps it into one
gt; block text with no line breaks. The below text file shows the format of two
gt; products, the first product has 2 grades and the second product has 3 grades.
gt; How can I use macros, formulas, anything to get the block of text into the
gt; format I want. It would take me years if I go in by hand and remove the
gt; unwanted characters and manually copy and paste them into cells. Please help
gt; I would appreciate it so so so much. Is this even possible (or feasible).
gt; There is no way I can get any other form of the data.
gt; Here is an example of what I have:
gt;
gt; www.savefile.com/files/7119995
gt;
gt; Example of what I need:
gt;
gt; www.savefile.com/files/9420878
gt;
gt; Happy Holidays Everyone, and a huge thank you for all of the people who help
gt; people on this site!
- Nov 21 Wed 2007 20:40
I'm stumped, splitting up a clump of text into useable information
close
全站熱搜
留言列表
發表留言