WebReadIE

 

Syntax: WEBREADIE(URL, Pattern, ErrorMessage, CodePage, DateTimeFormatSpecification, NumberFormatSpecification, AutoFormat)

 

Equivalent Function Names: WRIE

 

Purpose: The function downloads a website page from a given URL using Internet Explorer and its AJAX, Cookies and other functionality, parses it for a regular expression pattern and returns a range back. The results are cached using the standard Spearian's Market Data Cache.

 

This function takes longer than the WebRead function, as it opens the entire Internet Explorer, reuses its saved cookies and returns the page.

 

Thus, this is the ideal method for reading of the websites which require authentication, such as the websites of WSJ or Financial Times.

 

Parameters: 
 

1.URL is complete URL of the webpage, including the protocol, e.g. http://
 

2.Pattern is the Regular expression pattern.

Note: Pattern should not be longer than 255 characters.

If left blank, the default pattern of "(.*)" returning the complete html page is used.

Hint - use RegexBuddy or http://regexpal.com/ for testing of your expressions
 

3.ErrorMessage - if specified, it's returned instead of the default exception text
 

4.CodePage - the page used for parsing of the matched results to numbers and dates - the default value is "en-US". The list of possible values is at http://msdn.microsoft.com/en-us/goglobal/bb896001.aspx (check the column Culture Name)

In case you encounter any problems, see this article.
 

5.DateTimeFormatSpecification - if this field is present, the CodePage parameter is ignored for the conversions of Dates to Excel format. See the full list at http://msdn.microsoft.com/en-us/library/8kb3ddd4.aspx

Example: "Mon 23 Jun 8:45 AM 2013" will be recognized with this specification "ddd dd MMM h:mm tt yyyy"
 

6.NumberFormatSpecification  - if this field is present, the CodePage parameter is ignored for the conversions of Numbers to Excel format.

The parameter takes form of at least 4 characters:
1. Negative Sign character (1 character)
2. Currency Decimal Separator (1 character)
3. Currency Group Separator (1 character)
4. Currency Symbol (at least 1 character)

Example: "-.,$" is the standard setting for the US.

 

7.AutoFormat - an optional flag signifying if the result should be automatically converted to a range and if the result should be automatically formatted as Excel Data Types
 

 

Output: The matches range of the results. WARNING: The result cells are automatically converted to the right Excel type, e.g. date type.

 

Recommended procedure: If you see any useful statistics to put in to your spreadsheet, just open in the browser the website source and try to find the regular expression.

 

Note: If you find any given formula valuable, it can be incorporated into the Spearian's Native (=super-fast range) formulas

 

Examples:
1. Retrieve the top news headlines for a given stock

=WRIE("http://finance.yahoo.com/q/h?s=MSFT+Headlines","(?i)<div class=""mod yfi_quote_headline withsky"">\r\n<ul class=""yfncnhl newsheadlines""></ul>\r\n<h3><span>([^<]*)</span></h3>\r\n<ul>\r\n<li><a href=""([^<]*)"">([^<]*)</a><cite>") returns the top headline for a given stock

 

 

Note: the same result as with WR only slower

 

2. Check the contents of the page being processed

To check the html text file to be processed by regular expression, just use =WRIE("http://finance.yahoo.com/q/h?s=MSFT+Headlines","(.*)")

 

3. Read a page from WSJ requiring authentication

First, open the WSJ web page in Internet Explorer, log on and choose to save the login name and password. Then, the WRIE method will start working.

 

 

Copyright © 2013-2017 Jiri Pik

Document Version: Sunday, May 7, 2017