Suvarna's Weblog on Development for Mobile Devices

Software Design Engineer, Mobile Devices

Blogs

Extracting Content out from HTML Windows on Pocket PC

  • Comments 1
  • Likes

Pocket PC 2003 does not support Richedit. The HTML Control can be used as the canvas to display text and images. More information on how to use the HTML control can be found at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/apippc/html/ppc_htmlctrl_nacl.asp

Extracting Text out of HTML control is not such an easy job on Pocket PC 2003. There are no direct ways to do so. The following code snippets will demonstrate how this can be done.

Extracting Plain Text out of an HTML Window
There is a fairly simple way to do so. Select the text in the HTML Window and copy it into an IStream Buffer.
1. Select the text - use message DTM_SELECTALL
2. Copy the selection to an IStream - use message DTM_COPYSELECTIONTONEWISTREAM
3. Read from the IStream into a buffer

The catch here is that there is no was to deselect the text now. So, if you are still using your HTML window to display content, you may not want it this way.
There is another way to extract the text, however a little complicated. Nevertheless it does the job and you can also get the content along with all the html formatting if you so desire

Using Div tags to facilitate text extraction from HTML Window
Put div tags around any content that you want to extract from the HTML Window
<div id = "dividID">ACTUAL CONTENT </div>
ID should be an unique ID for each content that you want to extract

The following code snippet will tell you how to extract the content which is enclosed within a particular div tag

IDispatch *pDisp;
IPIEHTMLDocument2 *pHTMLDocument;
IPIEHTMLDivElement *pHTMLDivElement;
BSTR bText = SysAllocStringLen(0, 256);
WCHAR szText[256];
OLECHAR FAR* szTemp;
DISPID id;

SendMessage( DTM_DOCUMENTDISPATCH, (WPARAM)0, (LPARAM)&pIDisp );
if( NULL != pIDisp)
{
    pDisp->QueryInterface( __uuidof(IPIEHTMLDocument2), (void**)&pHTMLDocument);
}

szTemp= szText;
StringCchPrintf(szText, 256, "divid%ld", ID); //This is the div tag that we are looking for
pHTMLDocument->GetIDsOfNames(IID_NULL, &szTemp, 1,  LOCALE_USER_DEFAULT, &id);

VARIANT varResult;
varResult.vt = VT_DISPATCH;
VARIANT FAR *pVarResult = &varResult;
DISPPARAMS dispparamsNoArgs = {NULL, NULL, 0, 0};

pHTMLDocument->Invoke( id, ID_NULL, LOCALE_USER_DEFAULT, DISPATCH_PROPERTYGET, &dispparamsNoArgs, pVarResult, NULL, NULL);

if ( NULL != pVarResult->pdispVal)
{
    pVarResult->pdispVal->QueryInterface( __uuidof(IPIEHTMLDivElement), (void **) &pHTMLDivElement);
}

if ( NULL != pHTMLDivElement)
{
    pHTMLDivElement->get_innerHTML(&bText);
// To get html content
    pHTMLDivElement->get_innerText(&bText); // To get text only content
}

Disclaimer : This posting is provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment