Year 2020

Hyperautomation is the combination of multiple machine learning (ML), packaged software and automation tools to deliver work. Hyperautomation refers not only to the breadth of the pallet of tools, but also to all the steps of automation itself (discover, analyze, design, automate, measure, monitor and reassess). Understanding the range of automation mechanisms, how they relate to one another and how they can be combined and coordinated is a major focus for hyperautomation.

Gartner Trends for 2020

A Secret is an object that contains a small amount of sensitive data such as a password, a token, or a key. Such information might otherwise be put in a Pod specification or in an image; putting it in a Secret object allows for more control over how it is used, and reduces the risk of accidental exposure.

http://www.softwareschule.ch/examples/singlesamplepredict.htm

Research: A regular expression can describe any “regular” language. These languages are ones where complexity is finite: there is a limited number of possibilities.

Caution: Some languages, like HTML, are not regular languages. This means you cannot fully parse them with traditional regular expressions.

Automaton: A regular expression is based on finite state machines. These automata encode states and possible transitions to new states.

Operators: Regular expressions use compiler theory. With a compiler, we transform regular languages (like Regex) into tiny programs that mess with text.

Most classification algorithms will only perform optimally when the number of samples of each class is roughly the same. Highly skewed datasets, where the minority is heavily outnumbered by one or more classes, have proven to be a challenge while at the same time becoming more and more common.

Web Spider Tarantula

The function in this maXbox unit allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer.

3d_processingimage_artificial_skin_research

The functions arranges the original site’s relative link-structure. Simply open a page of the “mirrored” website in your browser, and you can browse the site from link to link, as if you were viewing it online.

A common question asked in webmaster forums is how to block certain web spiders, crawlers or bots from accessing your site. You can do this using robots.txt, but some web crawlers have been known to ignore this request. A more reliable way to block bots is to use your .htaccess file instead.

sr:= '12'
for k:= 1 to 9 do begin 
 write(sr+'*')
 qs:= 0;
 for i:= 1 to length(sr) do
   qs:= qs + atoi(sr[i]) ;
 write(itoa(qs)+' = ')
 sr:= itoa(atoi(sr)*qs)
end; 
// competition 82: what's the output of this loop?
// 12*3 = 36*9 = 324*9 = 2916*18 = 52488*27 = 1417176*27 = 38263752*36 = 
1377495072*45 = 1857736096*52
{*************************************************************

Convert Markdown to AssetStorm-JSON using PyPandoc and Flask:

Tested the 2 frameworks and build few tests for webspider too.

https://github.com/pinae/Markdown2AssetStorm

http://www.softwareschule.ch/examples/wolfram.txt

product: ALWebSpider
Description: The function in this unit allows you to download a
 World Wide Web site from the Internet to a local directory,
 building recursively all directories, getting HTML, images,
 and other files from the server to your computer. The functions
 arranges the original site’s relative link-structure. Simply
 open a page of the “mirrored” website in your browser, and you
 can browse the site from link to link, as if you were viewing it
 online.
Know bug : Link like :
 <td><img src=”/imgmix/situation.php?dept=Corse du Sud&coordx=1149.00&coordy=1657.60&t=1133520053 width=”200″ height=”200″ border=”0″></td>
 Will be not handle corretly because one ” is missed.
 it’s not an valide HTML document but unfortunatly ie work correctly
 with this kind of error… mean that webmaster can make this error
 without seeing it ! so we need to find a way to handle this error
**************************************************************}
unit ALWebSpider;
interface
{$IF CompilerVersion >= 25} {Delphi XE4}
{$LEGACYIFEND ON} // http://docwiki.embarcadero.com/RADStudio/XE4/en/Legacy_IFEND_(Delphi)
{$IFEND}
Uses System.classes,
AlAvlBinaryTree,
AlHTTPClient,
AlStringList;
Type
{—————————————————————–}
TAlWebSpiderCrawlDownloadSuccessEvent = procedure (Sender: TObject;
const Url: AnsiString;
HTTPResponseHeader: TALHTTPResponseHeader;
HttpResponseContent: TStream;
Var StopCrawling: Boolean) of object;
{——————————————————————}
TAlWebSpiderCrawlDownloadRedirectEvent = procedure (Sender: TObject;
const Url: AnsiString;
const RedirectedTo: AnsiString;
HTTPResponseHeader: TALHTTPResponseHeader;
Var StopCrawling: Boolean) of object;
{—————————————————————}
TAlWebSpiderCrawlDownloadErrorEvent = procedure (Sender: TObject;
const URL: AnsiString;
const ErrorMessage: AnsiString;
HTTPResponseHeader: TALHTTPResponseHeader;
Var StopCrawling: Boolean) of object;
{————————————————————-}
TAlWebSpiderCrawlGetNextLinkEvent = procedure (Sender: TObject;
Var Url: AnsiString) of object;
{———————————————————-}
TAlWebSpiderCrawlFindLinkEvent = Procedure (Sender: TObject;
const HtmlTagString: AnsiString;
HtmlTagParams: TALStrings;
const URL: AnsiString) of object;
{—————————————————————-}
TAlWebSpiderCrawlEndEvent = Procedure (Sender: TObject) of object;
{————————————————————————————————–}
TAlWebSpiderCrawlBeforeDownloadEvent = Procedure (Sender: TObject; const Url: AnsiString) of object;
{—————————————————————}
TAlWebSpiderCrawlAfterDownloadEvent = Procedure (Sender: TObject;
const Url: AnsiString;
HTTPResponseHeader: TALHTTPResponseHeader;
HttpResponseContent: TStream;
Var StopCrawling: Boolean) of object;
{—————————————————————————–}
TAlWebSpiderUpdateLinkToLocalPathGetNextFileEvent = procedure (Sender: TObject;
Var FileName: AnsiString;
Var BaseHref: AnsiString) of object;
{————————————————————————–}
TAlWebSpiderUpdateLinkToLocalPathFindLinkEvent = Procedure (Sender: TObject;
const HtmlTagString: AnsiString;
HtmlTagParams: TALStrings;
const URL: AnsiString;
Var LocalPath: AnsiString) of object;
{——————————————————————————–}
TAlWebSpiderUpdateLinkToLocalPathEndEvent = Procedure (Sender: TObject) of object;
{—————————}
TAlWebSpider = Class(TObject)
Private
fOnUpdateLinkToLocalPathEnd: TAlWebSpiderUpdateLinkToLocalPathEndEvent;
fOnUpdateLinkToLocalPathFindLink: TAlWebSpiderUpdateLinkToLocalPathFindLinkEvent;
fOnUpdateLinkToLocalPathGetNextFile: TAlWebSpiderUpdateLinkToLocalPathGetNextFileEvent;
fOnCrawlDownloadError: TAlWebSpiderCrawlDownloadErrorEvent;
fOnCrawlDownloadRedirect: TAlWebSpiderCrawlDownloadRedirectEvent;
fOnCrawlDownloadSuccess: TAlWebSpiderCrawlDownloadSuccessEvent;
FOnCrawlFindLink: TAlWebSpiderCrawlFindLinkEvent;
fOnCrawlGetNextLink: TAlWebSpiderCrawlGetNextLinkEvent;
FOnCrawlEnd: TAlWebSpiderCrawlEndEvent;
FHttpClient: TalHttpClient;
fOnCrawlBeforeDownload: TAlWebSpiderCrawlBeforeDownloadEvent;
fOnCrawlAfterDownload: TAlWebSpiderCrawlAfterDownloadEvent;
Public
Procedure Crawl; {Launch the Crawling of the page}
Procedure UpdateLinkToLocalPath; {Update the link of downloaded page to local path}
Property OnCrawlBeforeDownload: TAlWebSpiderCrawlBeforeDownloadEvent read fOnCrawlBeforeDownload write fOnCrawlBeforeDownload; {When a page is successfully downloaded}
Property OnCrawlAfterDownload: TAlWebSpiderCrawlAfterDownloadEvent read fOnCrawlAfterDownload write fOnCrawlAfterDownload; {When a page is successfully downloaded}
Property OnCrawlDownloadSuccess: TAlWebSpiderCrawlDownloadSuccessEvent read fOnCrawlDownloadSuccess write fOnCrawlDownloadSuccess; {When a page is successfully downloaded}
Property OnCrawlDownloadRedirect: TAlWebSpiderCrawlDownloadRedirectEvent read fOnCrawlDownloadRedirect write fOnCrawlDownloadRedirect; {When a page is redirected}
Property OnCrawlDownloadError: TAlWebSpiderCrawlDownloadErrorEvent read fOnCrawlDownloadError write fOnCrawlDownloadError; {When the download of a page encounter an error}
Property OnCrawlGetNextLink: TAlWebSpiderCrawlGetNextLinkEvent read fOnCrawlGetNextLink Write FOnCrawlGetNextLink; {When we need another url to download}
Property OnCrawlFindLink: TAlWebSpiderCrawlFindLinkEvent read FOnCrawlFindLink Write FOnCrawlFindLink; {When we find a link in url just downloaded}
Property OnCrawlEnd: TAlWebSpiderCrawlEndEvent read FOnCrawlEnd write fOnCrawlEnd; {When their is no more url to crawl}
Property OnUpdateLinkToLocalPathGetNextFile: TAlWebSpiderUpdateLinkToLocalPathGetNextFileEvent read fOnUpdateLinkToLocalPathGetNextFile write fOnUpdateLinkToLocalPathGetNextFile; {When we need another file to update link to local path}
property OnUpdateLinkToLocalPathFindLink: TAlWebSpiderUpdateLinkToLocalPathFindLinkEvent read fOnUpdateLinkToLocalPathFindLink write fOnUpdateLinkToLocalPathFindLink; {When we find a link and we need the local path for the file}
property OnUpdateLinkToLocalPathEnd: TAlWebSpiderUpdateLinkToLocalPathEndEvent read fOnUpdateLinkToLocalPathEnd write fOnUpdateLinkToLocalPathEnd; {When their is no more local file to update link}
Property HttpClient: TalHttpClient Read FHttpClient write FHttpClient; {http client use to crawl the web}
end;
{————————————————————————————————————————————————-}
TAlTrivialWebSpiderCrawlProgressEvent = Procedure (Sender: TObject; UrltoDownload, UrlDownloaded: Integer; const CurrentUrl: AnsiString) of object;
{————————————————————————————————————————-}
TAlTrivialWebSpiderUpdateLinkToLocalPathProgressEvent = Procedure (Sender: TObject; const aFileName: AnsiString) of object;
{—————————————————————–}
TAlTrivialWebSpiderCrawlFindLinkEvent = Procedure (Sender: TObject;
const HtmlTagString: AnsiString;
HtmlTagParams: TALStrings;
const URL: AnsiString;
Var Ignore: Boolean) of object;
{———————————-}
TAlTrivialWebSpider = Class(Tobject)
Private
FWebSpider: TalWebSpider;
fStartUrl: AnsiString;
fLstUrlCrawled: TALStrings;
fLstErrorEncountered: TALStrings;
FPageDownloadedBinTree: TAlStringKeyAVLBinaryTree;
FPageNotYetDownloadedBinTree: TAlStringKeyAVLBinaryTree;
FCurrentDeepLevel: Integer;
FCurrentLocalFileNameIndex: Integer;
fMaxDeepLevel: Integer;
fOnCrawlBeforeDownload: TAlWebSpiderCrawlBeforeDownloadEvent;
fUpdateLinkToLocalPath: boolean;
fExcludeMask: AnsiString;
fStayInStartDomain: Boolean;
fSaveDirectory: AnsiString;
fSplitDirectoryAmount: integer;
FHttpClient: TalHttpClient;
fIncludeMask: AnsiString;
fOnCrawlAfterDownload: TAlWebSpiderCrawlAfterDownloadEvent;
fOnCrawlFindLink: TAlTrivialWebSpiderCrawlFindLinkEvent;
fDownloadImage: Boolean;
fOnUpdateLinkToLocalPathProgress: TAlTrivialWebSpiderUpdateLinkToLocalPathProgressEvent;
fOnCrawlProgress: TAlTrivialWebSpiderCrawlProgressEvent;
procedure WebSpiderCrawlDownloadError(Sender: TObject; const URL, ErrorMessage: AnsiString; HTTPResponseHeader: TALHTTPResponseHeader; var StopCrawling: Boolean);
procedure WebSpiderCrawlDownloadRedirect(Sender: TObject; const Url, RedirectedTo: AnsiString; HTTPResponseHeader: TALHTTPResponseHeader; var StopCrawling: Boolean);
procedure WebSpiderCrawlDownloadSuccess(Sender: TObject; const Url: AnsiString; HTTPResponseHeader: TALHTTPResponseHeader; HttpResponseContent: TStream; var StopCrawling: Boolean);
procedure WebSpiderCrawlFindLink(Sender: TObject; const HtmlTagString: AnsiString; HtmlTagParams: TALStrings; const URL: AnsiString);
procedure WebSpiderCrawlGetNextLink(Sender: TObject; var Url: AnsiString);
procedure WebSpiderUpdateLinkToLocalPathFindLink(Sender: TObject; const HtmlTagString: AnsiString; HtmlTagParams: TALStrings; const URL: AnsiString; var LocalPath: AnsiString);
procedure WebSpiderUpdateLinkToLocalPathGetNextFile(Sender: TObject; var FileName, BaseHref: AnsiString);
function GetNextLocalFileName(const aContentType: AnsiString): AnsiString;
Protected
Public
Constructor Create;
Destructor Destroy; override;
Procedure Crawl(const aUrl: AnsiString); overload; {Launch the Crawling of the page}
procedure Crawl(const aUrl: AnsiString; LstUrlCrawled: TALStrings; LstErrorEncountered: TALStrings); overload;
Property HttpClient: TalHttpClient Read FHttpClient write FHttpClient;
Property DownloadImage: Boolean read fDownloadImage write fDownloadImage default false;
Property StayInStartDomain: Boolean read fStayInStartDomain write fStayInStartDomain default true;
Property UpdateLinkToLocalPath: boolean read fUpdateLinkToLocalPath write fUpdateLinkToLocalPath default True;
Property MaxDeepLevel: Integer read fMaxDeepLevel write fMaxDeepLevel default –1;
Property ExcludeMask: AnsiString read fExcludeMask write fExcludeMask;
Property IncludeMask: AnsiString read fIncludeMask write fIncludeMask;
Property SaveDirectory: AnsiString read fSaveDirectory write fSaveDirectory;
Property SplitDirectoryAmount: integer read fSplitDirectoryAmount write fSplitDirectoryAmount default 5000;
Property OnCrawlBeforeDownload: TAlWebSpiderCrawlBeforeDownloadEvent read fOnCrawlBeforeDownload write fOnCrawlBeforeDownload; {When a page is successfully downloaded}
Property OnCrawlAfterDownload: TAlWebSpiderCrawlAfterDownloadEvent read fOnCrawlAfterDownload write fOnCrawlAfterDownload; {When a page is successfully downloaded}
Property OnCrawlFindLink: TAlTrivialWebSpiderCrawlFindLinkEvent read fOnCrawlFindLink write fOnCrawlFindLink; {When a a link is found}
Property OnCrawlProgress: TAlTrivialWebSpiderCrawlProgressEvent read fOnCrawlProgress write fOnCrawlProgress;
Property OnUpdateLinkToLocalPathProgress: TAlTrivialWebSpiderUpdateLinkToLocalPathProgressEvent read fOnUpdateLinkToLocalPathProgress write fOnUpdateLinkToLocalPathProgress;
end;
{———————————————————————————-}
TAlTrivialWebSpider_PageDownloadedBinTreeNode = Class(TALStringKeyAVLBinaryTreeNode)
Private
Protected
Public
Data: AnsiString;
end;
{—————————————————————————————-}
TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode = Class(TALStringKeyAVLBinaryTreeNode)
Private
Protected
Public
DeepLevel: Integer;
end;
implementation
Uses Winapi.Windows,
System.sysutils,
Winapi.WinInet,
Winapi.UrlMon,
AlHTML,
AlMime,
ALString;
type
{*****************************************}
_TAlWebSpiderHandleTagfunctExtData = record
WebSpiderObj: TAlWebSpider;
CurrentBaseHref: AnsiString;
end;
{************************************************************************}
Function _AlWebSpiderExtractUrlHandleTagfunct(const TagString: AnsiString;
TagParams: TALStrings;
ExtData: pointer;
Var Handled: Boolean): AnsiString;
{—————————————————————}
Procedure FindUrl(aUrl: ansiString; const aBaseHref: AnsiString);
Begin
{do not work with anchor in self document}
If (aUrl <> ”) and (AlPos(‘#’,aUrl) <> 1) then begin
{make url full path}
aUrl := AlCombineUrl(aUrl, aBaseHref);
{exit if it’s not a http sheme}
IF (AlExtractShemeFromUrl(aUrl) in [INTERNET_SCHEME_HTTP, INTERNET_SCHEME_HTTPS]) then
{fire findlink Event}
with _TAlWebSpiderHandleTagfunctExtData(ExtData^) do
WebSpiderObj.FOnCrawlFindLink(WebSpiderObj,
TagString,
TagParams,
aUrl);
end;
end;
Var Str: AnsiString;
LowerTagString: AnsiString;
begin
Handled := False;
Result := ”;
ALCompactHtmlTagParams(TagParams);
LowerTagString := AlLowerCase(TagString);
with _TAlWebSpiderHandleTagfunctExtData(ExtData^) do begin
If LowerTagString = ‘a’ then FindUrl(ALTrim(TagParams.Values[‘href’]),CurrentBaseHref)
else If LowerTagString = ‘applet’ then Begin
str := ALTrim(TagParams.Values[‘codebase’]); //The CODEBASE parameter specifies where the jar and cab files are located.
{make str full path}
If str <> ” then Str := AlCombineUrl(Str, CurrentBaseHref)
else Str := CurrentBaseHref;
FindUrl(ALTrim(TagParams.Values[‘code’]), Str); //The URL specified by code might be relative to the codebase attribute.
FindUrl(ALTrim(TagParams.Values[‘archive’]), Str); //The URL specified by code might be relative to the codebase attribute.
end
else if LowerTagString = ‘area’ then FindUrl(ALTrim(TagParams.Values[‘href’]),CurrentBaseHref)
else if LowerTagString = ‘bgsound’ then FindUrl(ALTrim(TagParams.Values[‘src’]),CurrentBaseHref)
else if LowerTagString = ‘blockquote’ then FindUrl(ALTrim(TagParams.Values[‘cite’]),CurrentBaseHref)
else if LowerTagString = ‘body’ then FindUrl(ALTrim(TagParams.Values[‘background’]),CurrentBaseHref)
else if LowerTagString = ‘del’ then FindUrl(ALTrim(TagParams.Values[‘cite’]),CurrentBaseHref)
else if LowerTagString = ’embed’ then FindUrl(ALTrim(TagParams.Values[‘src’]),CurrentBaseHref)
else if LowerTagString = ‘frame’ then begin
FindUrl(ALTrim(TagParams.Values[‘longdesc’]),CurrentBaseHref);
FindUrl(ALTrim(TagParams.Values[‘src’]),CurrentBaseHref);
end
else if LowerTagString = ‘head’ then FindUrl(ALTrim(TagParams.Values[‘profile’]),CurrentBaseHref)
else if LowerTagString = ‘iframe’ then begin
FindUrl(ALTrim(TagParams.Values[‘longdesc’]),CurrentBaseHref);
FindUrl(ALTrim(TagParams.Values[‘src’]),CurrentBaseHref);
end
else if LowerTagString = ‘ilayer’ then begin
FindUrl(ALTrim(TagParams.Values[‘background’]),CurrentBaseHref);
FindUrl(ALTrim(TagParams.Values[‘src’]),CurrentBaseHref);
end
else If LowerTagString = ‘img’ then Begin
FindUrl(ALTrim(TagParams.Values[‘longdesc’]),CurrentBaseHref);
FindUrl(ALTrim(TagParams.Values[‘src’]),CurrentBaseHref);
FindUrl(ALTrim(TagParams.Values[‘usemap’]),CurrentBaseHref);
FindUrl(ALTrim(TagParams.Values[‘dynsrc’]),CurrentBaseHref);
FindUrl(ALTrim(TagParams.Values[‘lowsrc’]),CurrentBaseHref);
end
else if LowerTagString = ‘input’ then begin
FindUrl(ALTrim(TagParams.Values[‘src’]),CurrentBaseHref);
FindUrl(ALTrim(TagParams.Values[‘usemap’]),CurrentBaseHref);
FindUrl(ALTrim(TagParams.Values[‘dynsrc’]),CurrentBaseHref);
FindUrl(ALTrim(TagParams.Values[‘lowsrc’]),CurrentBaseHref);
end
else if LowerTagString = ‘ins’ then FindUrl(ALTrim(TagParams.Values[‘cite’]),CurrentBaseHref)
else if LowerTagString = ‘layer’ then begin
FindUrl(ALTrim(TagParams.Values[‘background’]),CurrentBaseHref);
FindUrl(ALTrim(TagParams.Values[‘src’]),CurrentBaseHref);
end
else if LowerTagString = ‘link’ then FindUrl(ALTrim(TagParams.Values[‘href’]),CurrentBaseHref)
else If LowerTagString = ‘object’ then Begin
str := ALTrim(TagParams.Values[‘codebase’]); //The CODEBASE parameter specifies where the jar and cab files are located.
{make str full path}
If str <> ” then Str := AlCombineUrl(Str,CurrentBaseHref)
else Str := CurrentBaseHref;
FindUrl(ALTrim(TagParams.Values[‘classid’]), Str); //The URL specified by code might be relative to the codebase attribute.
FindUrl(ALTrim(TagParams.Values[‘data’]), Str); //The URL specified by code might be relative to the codebase attribute.
FindUrl(ALTrim(TagParams.Values[‘archive’]), Str); //The URL specified by code might be relative to the codebase attribute.
FindUrl(ALTrim(TagParams.Values[‘usemap’]), CurrentBaseHref);
end
else if LowerTagString = ‘q’ then FindUrl(ALTrim(TagParams.Values[‘cite’]),CurrentBaseHref)
else if LowerTagString = ‘script’ then FindUrl(ALTrim(TagParams.Values[‘src’]),CurrentBaseHref)
else if LowerTagString = ‘table’ then FindUrl(ALTrim(TagParams.Values[‘background’]),CurrentBaseHref)
else if LowerTagString = ‘td’ then FindUrl(ALTrim(TagParams.Values[‘background’]),CurrentBaseHref)
else if LowerTagString = ‘th’ then FindUrl(ALTrim(TagParams.Values[‘background’]),CurrentBaseHref)
else if LowerTagString = ‘xml’ then FindUrl(ALTrim(TagParams.Values[‘src’]),CurrentBaseHref)
else if LowerTagString = ‘base’ then Begin
Str := ALTrim(TagParams.Values[‘href’]);
If str <> ” then CurrentBaseHref := Str;
end;
end;
end;
{***********************************************************************************}
Function _AlWebSpiderUpdateLinkToLocalPathHandleTagfunct(const TagString: AnsiString;
TagParams: TALStrings;
ExtData: pointer;
Var Handled: Boolean): AnsiString;
{———————————————————}
Procedure FindUrl(const aParamName, aBaseHref: AnsiString);
Var aUrl: AnsiString;
aLocalPathValue : AnsiString;
Begin
{extract Url}
aUrl := ALTrim(TagParams.Values[aParamName]);
{do not work with anchor in self document}
If (aUrl <> ”) and (AlPos(‘#’,aUrl) <> 1) then begin
{make url full path}
aUrl := AlCombineUrl(aUrl, aBaseHref);
{exit if it’s not a http sheme}
IF (AlExtractShemeFromUrl(aUrl) in [INTERNET_SCHEME_HTTP, INTERNET_SCHEME_HTTPS]) then begin
{init local path value}
aLocalPathValue := ”;
{fire findlink Event}
with _TAlWebSpiderHandleTagfunctExtData(ExtData^) do
WebSpiderObj.fOnUpdateLinkToLocalPathFindLink(WebSpiderObj,
TagString,
TagParams,
aUrl,
aLocalPathValue);
{update tagParams}
If (aLocalPathValue <> ”) then begin
Handled := True;
TagParams.Values[aParamName] := aLocalPathValue; // 1234.htm#foo
end;
end;
end;
end;
Var Str: AnsiString;
LowerTagString: AnsiString;
i: Integer;
begin
Handled := False;
Result := ”;
ALCompactHtmlTagParams(TagParams);
LowerTagString := AlLowerCase(TagString);
with _TAlWebSpiderHandleTagfunctExtData(ExtData^) do begin
If LowerTagString = ‘a’ then FindUrl(‘href’,CurrentBaseHref)
else If LowerTagString = ‘applet’ then Begin
str := ALTrim(TagParams.Values[‘codebase’]); //The CODEBASE parameter specifies where the jar and cab files are located.
{make str full path}
If str <> ” then Str := AlCombineUrl(Str,CurrentBaseHref)
else Str := CurrentBaseHref;
FindUrl(‘code’, Str); //The URL specified by code might be relative to the codebase attribute.
FindUrl(‘archive’, Str); //The URL specified by code might be relative to the codebase attribute.
end
else if LowerTagString = ‘area’ then FindUrl(‘href’,CurrentBaseHref)
else if LowerTagString = ‘bgsound’ then FindUrl(‘src’,CurrentBaseHref)
else if LowerTagString = ‘blockquote’ then FindUrl(‘cite’,CurrentBaseHref)
else if LowerTagString = ‘body’ then FindUrl(‘background’,CurrentBaseHref)
else if LowerTagString = ‘del’ then FindUrl(‘cite’,CurrentBaseHref)
else if LowerTagString = ’embed’ then FindUrl(‘src’,CurrentBaseHref)
else if LowerTagString = ‘frame’ then begin
FindUrl(‘longdesc’,CurrentBaseHref);
FindUrl(‘src’,CurrentBaseHref);
end
else if LowerTagString = ‘head’ then FindUrl(‘profile’,CurrentBaseHref)
else if LowerTagString = ‘iframe’ then begin
FindUrl(‘longdesc’,CurrentBaseHref);
FindUrl(‘src’,CurrentBaseHref);
end
else if LowerTagString = ‘ilayer’ then begin
FindUrl(‘background’,CurrentBaseHref);
FindUrl(‘src’,CurrentBaseHref);
end
else If LowerTagString = ‘img’ then Begin
FindUrl(‘longdesc’,CurrentBaseHref);
FindUrl(‘src’,CurrentBaseHref);
FindUrl(‘usemap’,CurrentBaseHref);
FindUrl(‘dynsrc’,CurrentBaseHref);
FindUrl(‘lowsrc’,CurrentBaseHref);
end
else if LowerTagString = ‘input’ then begin
FindUrl(‘src’,CurrentBaseHref);
FindUrl(‘usemap’,CurrentBaseHref);
FindUrl(‘dynsrc’,CurrentBaseHref);
FindUrl(‘lowsrc’,CurrentBaseHref);
end
else if LowerTagString = ‘ins’ then FindUrl(‘cite’,CurrentBaseHref)
else if LowerTagString = ‘layer’ then begin
FindUrl(‘background’,CurrentBaseHref);
FindUrl(‘src’,CurrentBaseHref);
end
else if LowerTagString = ‘link’ then FindUrl(‘href’,CurrentBaseHref)
else If LowerTagString = ‘object’ then Begin
str := ALTrim(TagParams.Values[‘codebase’]); //The CODEBASE parameter specifies where the jar and cab files are located.
{make str full path}
If str <> ” then Str := AlCombineUrl(Str, CurrentBaseHref)
else Str := CurrentBaseHref;
FindUrl(‘classid’, Str); //The URL specified by code might be relative to the codebase attribute.
FindUrl(‘data’, Str); //The URL specified by code might be relative to the codebase attribute.
FindUrl(‘archive’, Str); //The URL specified by code might be relative to the codebase attribute.
FindUrl(‘usemap’, CurrentBaseHref);
end
else if LowerTagString = ‘q’ then FindUrl(‘cite’,CurrentBaseHref)
else if LowerTagString = ‘script’ then FindUrl(‘src’,CurrentBaseHref)
else if LowerTagString = ‘table’ then FindUrl(‘background’,CurrentBaseHref)
else if LowerTagString = ‘td’ then FindUrl(‘background’,CurrentBaseHref)
else if LowerTagString = ‘th’ then FindUrl(‘background’,CurrentBaseHref)
else if LowerTagString = ‘xml’ then FindUrl(‘src’,CurrentBaseHref)
else if LowerTagString = ‘base’ then begin
Handled := True;
exit;
end;
{update the html source code}
If handled then begin
Result := ‘<‘+TagString;
for i := 0 to TagParams.Count – 1 do
If TagParams.Names[i] <> ” then Result := Result + ‘ ‘ + TagParams.Names[i] + ‘=”‘+ alStringReplace(TagParams.ValueFromIndex[i],
‘”‘,
‘”‘,
[rfReplaceAll]) + ‘”‘
else Result := Result + ‘ ‘ + TagParams[i];
Result := result + ‘>’;
end;
end;
end;
{***************************}
procedure TAlWebSpider.Crawl;
Var currentUrl: AnsiString;
StopCrawling: Boolean;
UrlRedirect: Boolean;
DownloadError: Boolean;
CurrentHttpResponseHeader: TALHTTPResponseHeader;
CurrentHttpResponseContent: TStream;
aExtData: _TAlWebSpiderHandleTagfunctExtData;
pMimeTypeFromData: LPWSTR;
Str: AnsiString;
Begin
Try
StopCrawling := False;
If not assigned(FOnCrawlGetNextLink) then exit;
{start the main loop}
While True do begin
CurrentUrl := ”;
FOnCrawlGetNextLink(self, CurrentUrl); {Get the current url to process}
CurrentUrl := ALTrim(CurrentUrl);
If CurrentUrl = ” then Exit; {no more url to download then exit}
UrlRedirect := False;
DownloadError := False;
CurrentHttpResponseContent := TmemoryStream.Create;
CurrentHttpResponseHeader:= TALHTTPResponseHeader.Create;
Try
Try
{the onbeforedownloadevent}
if assigned(fOnCrawlBeforeDownload) then fOnCrawlBeforeDownload(Self,CurrentURL);
Try
{download the page}
FHttpClient.Get(CurrentURL,
CurrentHttpResponseContent,
CurrentHttpResponseHeader);
Finally
{the onAfterdownloadevent}
if assigned(fOnCrawlAfterDownload) then fOnCrawlAfterDownload(Self,CurrentURL, CurrentHttpResponseHeader, CurrentHttpResponseContent, StopCrawling);
End;
except
on E: Exception do begin
{in case of url redirect}
If Alpos(‘3’,CurrentHttpResponseHeader.StatusCode)=1 then begin
UrlRedirect := True;
If assigned(FOnCrawlDownloadRedirect) then fOnCrawlDownloadRedirect(Self,
CurrentUrl,
AlCombineUrl(ALTrim(CurrentHttpResponseHeader.Location), CurrentUrl),
CurrentHttpResponseHeader,
StopCrawling);
end
{in case of any other error}
else begin
DownloadError := True;
If assigned(FOnCrawlDownloadError) then fOnCrawlDownloadError(Self,
CurrentUrl,
AnsiString(E.Message),
CurrentHttpResponseHeader,
StopCrawling);
end;
end;
end;
{download OK}
If (not UrlRedirect) and (not DownloadError) then begin
{if size = 0 their is nothing to do}
if CurrentHTTPResponseContent.Size > 0 then begin
{read the content in Str}
CurrentHTTPResponseContent.Position := 0;
SetLength(Str, CurrentHTTPResponseContent.size);
CurrentHTTPResponseContent.ReadBuffer(pointer(Str)^,CurrentHTTPResponseContent.Size);
{check the mime content type because some server send wrong mime content type}
IF (FindMimeFromData(
nil, // bind context – can be nil
nil, // url – can be nil
PAnsiChar(str), // buffer with data to sniff – can be nil (pwzUrl must be valid)
length(str), // size of buffer
PWidechar(WideString(CurrentHttpResponseHeader.ContentType)), // proposed mime if – can be nil
0, // will be defined
pMimeTypeFromData, // the suggested mime
0 // must be 0
) <> NOERROR) then pMimeTypeFromData := PWidechar(WideString(CurrentHttpResponseHeader.ContentType));
{lanche the analyze of the page if content type = text/html}
If ALSameText(AnsiString(pMimeTypeFromData),‘text/html’) and
assigned(FOnCrawlFindLink) then begin
{init the CurrentBaseHref of the aExtData object}
aExtData.WebSpiderObj := self;
aExtData.CurrentBaseHref := CurrentURL;
{extract the list of url to download}
ALHideHtmlUnwantedTagForHTMLHandleTagfunct(Str, False, #1);
ALFastTagReplace(Str,
‘<‘,
‘>’,
_AlWebSpiderExtractUrlHandleTagfunct,
true,
@aExtData,
[rfreplaceall]);
end;
end;
{trigger the event OnCrawlDownloadSuccess}
if assigned(FOnCrawlDownloadSuccess) then begin
CurrentHTTPResponseContent.Position := 0;
fOnCrawlDownloadSuccess(self,
CurrentUrl,
CurrentHttpResponseHeader,
CurrentHttpResponseContent,
StopCrawling);
end;
end;
{if StopCrawling then exit}
If StopCrawling then exit;
finally
CurrentHTTPResponseContent.free;
CurrentHTTPResponseHeader.free;
end;
end;
finally
If assigned(FOnCrawlEnd) then FOnCrawlEnd(self);
end;
end;
{*******************************************}
procedure TAlWebSpider.UpdateLinkToLocalPath;
Var currentFileName: AnsiString;
CurrentBaseHref: AnsiString;
aExtData: _TAlWebSpiderHandleTagfunctExtData;
Str: AnsiString;
Begin
Try
If not assigned(FOnUpdateLinktoLocalPathGetNextFile) or not assigned(OnUpdateLinkToLocalPathFindLink) then exit;
{start the main loop}
While True do begin
CurrentFileName := ”;
CurrentBaseHref := ”;
FOnUpdateLinktoLocalPathGetNextFile(self, CurrentFileName, CurrentBaseHref); {Get the current html file to process}
CurrentFileName := ALTrim(CurrentFileName);
CurrentBaseHref := ALTrim(CurrentBaseHref);
If CurrentFileName = ” then Exit; {no more file to update}
iF FileExists(String(CurrentFileName)) then begin
{DOWNLOAD THE BODY}
Str := AlGetStringFromFile(CurrentFileName);
{init the CurrentBaseHref of the aExtData object}
aExtData.WebSpiderObj := self;
aExtData.CurrentBaseHref := CurrentBaseHref;
{Update the link}
ALHideHtmlUnwantedTagForHTMLHandleTagfunct(str, False, #1);
str := ALFastTagReplace(str,
‘<‘,
‘>’,
_AlWebSpiderUpdateLinkToLocalPathHandleTagfunct,
true,
@aExtData,
[rfreplaceall]);
{restore the page to it’s original format}
str := AlStringReplace(str,
#1,
‘<‘,
[rfReplaceAll]);
{save the result string}
AlSaveStringToFile(Str,CurrentFileName);
end;
end;
finally
If assigned(FOnUpdateLinkToLocalPathEnd) then FOnUpdateLinkToLocalPathEnd(self);
end;
end;
{**********************************************************}
procedure TAlTrivialWebSpider.Crawl(const aUrl: AnsiString);
begin
Crawl(aUrl, nil, nil);
end;
{**********************************************************************************************************************}
procedure TAlTrivialWebSpider.Crawl(const aUrl: AnsiString; LstUrlCrawled: TALStrings; LstErrorEncountered: TALStrings);
Var aNode: TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode;
Begin
{check the SaveDirectory}
if (SaveDirectory <> ”) and not directoryExists(String(SaveDirectory)) then Raise Exception.CreateFmt(‘The directory: “%s” not exist!’, [SaveDirectory]);
if fHttpClient = nil then Raise Exception.Create(‘The HttpClient cannot be empty!’);
{init private var}
fStartUrl := ALTrim(aUrl);
FCurrentDeepLevel := 0;
FCurrentLocalFileNameIndex := 0;
fLstUrlCrawled := LstUrlCrawled;
fLstErrorEncountered := LstErrorEncountered;
FPageDownloadedBinTree:= TAlStringKeyAVLBinaryTree.Create;
FPageNotYetDownloadedBinTree:= TAlStringKeyAVLBinaryTree.Create;
Try
{add editURL2Crawl.text to the fPageNotYetDownloadedBinTree}
aNode:= TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode.Create;
aNode.ID := fStartUrl;
aNode.DeepLevel := 0;
FPageNotYetDownloadedBinTree.AddNode(aNode);
{start the crawl}
fWebSpider.HttpClient := fHttpClient;
FWebSpider.onCrawlBeforeDownload := onCrawlBeforeDownload;
FWebSpider.onCrawlAfterDownload := onCrawlAfterDownload;
fWebSpider.Crawl;
{update the link on downloaded page to local path}
if fUpdateLinkToLocalPath then fWebSpider.UpdateLinkToLocalPath;
finally
FPageDownloadedBinTree.Free;
FPageNotYetDownloadedBinTree.Free;
fStartUrl := ”;
FCurrentDeepLevel := 0;
FCurrentLocalFileNameIndex := 0;
fLstUrlCrawled := nil;
fLstErrorEncountered := nil;
fWebSpider.HttpClient := nil;
end;
end;
{*************************************}
constructor TAlTrivialWebSpider.Create;
begin
FWebSpider := TalWebSpider.Create;
fStartUrl := ”;
fLstUrlCrawled := nil;
fLstErrorEncountered := nil;
FPageDownloadedBinTree := nil;
FPageNotYetDownloadedBinTree := nil;
FCurrentDeepLevel := 0;
FCurrentLocalFileNameIndex := 0;
fMaxDeepLevel := –1;
fOnCrawlBeforeDownload := nil;
fUpdateLinkToLocalPath := True;
fExcludeMask := ”;
fStayInStartDomain := True;
fSaveDirectory := ”;
fSplitDirectoryAmount := 5000;
FHttpClient := nil;
fIncludeMask := ‘*’;
fOnCrawlAfterDownload := nil;
fOnCrawlFindLink := nil;
fDownloadImage := False;
fOnUpdateLinkToLocalPathProgress:=nil;
fOnCrawlProgress:=nil;
FWebSpider.onCrawlDownloadError := WebSpiderCrawlDownloadError;
FWebSpider.onCrawlDownloadRedirect := WebSpiderCrawlDownloadRedirect;
FWebSpider.onCrawlDownloadSuccess := WebSpiderCrawlDownloadSuccess;
FWebSpider.onCrawlFindLink := WebSpiderCrawlFindLink;
FWebSpider.onCrawlGetNextLink := WebSpiderCrawlGetNextLink;
FWebSpider.onUpdateLinkToLocalPathFindLink := WebSpiderUpdateLinkToLocalPathFindLink;
FWebSpider.onUpdateLinkToLocalPathGetNextFile := WebSpiderUpdateLinkToLocalPathGetNextFile;
end;
{*************************************}
destructor TAlTrivialWebSpider.Destroy;
begin
FWebSpider.Free;
inherited;
end;
{********************************************************************************************}
function TAlTrivialWebSpider.GetNextLocalFileName(const aContentType: AnsiString): AnsiString;
Var aExt: AnsiString;
{—————————————–}
Function SplitPathMakeFilename: AnsiString;
begin
Result := fSaveDirectory + ALIntToStr((FCurrentLocalFileNameIndex div fSplitDirectoryAmount) * fSplitDirectoryAmount + fSplitDirectoryAmount) + ‘\’;
If (not DirectoryExists(string(Result))) and (not createDir(string(Result))) then raise EALException.CreateFmt(‘cannot create dir: %s’, [Result]);
Result := Result + ALIntToStr(FCurrentLocalFileNameIndex) + aExt;
inc(FCurrentLocalFileNameIndex);
end;
Begin
if fSaveDirectory = ” then result := ”
else begin
aExt := ALlowercase(ALGetDefaultFileExtFromMimeContentType(aContentType)); // ‘.htm’
If FCurrentLocalFileNameIndex = 0 then Begin
result := fSaveDirectory + ‘Start’ + aExt;
inc(FCurrentLocalFileNameIndex);
end
else result := SplitPathMakeFilename;
end;
end;
{************************************************************************}
procedure TAlTrivialWebSpider.WebSpiderCrawlDownloadError(Sender: TObject;
const URL, ErrorMessage: AnsiString;
HTTPResponseHeader: TALHTTPResponseHeader;
var StopCrawling: Boolean);
Var aNode: TAlTrivialWebSpider_PageDownloadedBinTreeNode;
begin
{add the url to downloaded list}
aNode:= TAlTrivialWebSpider_PageDownloadedBinTreeNode.Create;
aNode.ID := Url;
aNode.data := ‘!’;
If not FPageDownloadedBinTree.AddNode(aNode) then aNode.Free;
{delete the url from the not yet downloaded list}
FPageNotYetDownloadedBinTree.DeleteNode(url);
{update label}
if assigned(fLstErrorEncountered) then fLstErrorEncountered.Add(ErrorMessage);
if assigned(fOnCrawlProgress) then fOnCrawlProgress(self,FPageNotYetDownloadedBinTree.nodeCount,FPageDownloadedBinTree.nodeCount, Url);
end;
{***************************************************************************}
procedure TAlTrivialWebSpider.WebSpiderCrawlDownloadRedirect(Sender: TObject;
const Url, RedirectedTo: AnsiString;
HTTPResponseHeader: TALHTTPResponseHeader;
var StopCrawling: Boolean);
Var aNode: TALStringKeyAVLBinaryTreeNode;
aRedirectToWithoutAnchor: ansiString;
begin
{add the url to downloaded list}
aNode:= TAlTrivialWebSpider_PageDownloadedBinTreeNode.Create;
aNode.ID := Url;
TAlTrivialWebSpider_PageDownloadedBinTreeNode(aNode).data := ‘=>’+RedirectedTo;
If not FPageDownloadedBinTree.AddNode(aNode) then aNode.Free;
{delete the url from the not yet downloaded list}
FPageNotYetDownloadedBinTree.DeleteNode(url);
{Stay in start site}
If not fStayInStartDomain or
(ALlowercase(AlExtractHostNameFromUrl(ALTrim(fStartUrl))) = ALlowercase(AlExtractHostNameFromUrl(RedirectedTo))) then begin
{remove the anchor}
aRedirectToWithoutAnchor := AlRemoveAnchorFromUrl(RedirectedTo);
{add the redirectTo url to the not yet downloaded list}
If FPageDownloadedBinTree.FindNode(aRedirectToWithoutAnchor) = nil then begin
aNode:= TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode.Create;
aNode.ID := aRedirectToWithoutAnchor;
TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode(aNode).DeepLevel := FCurrentDeepLevel;
If not FPageNotYetDownloadedBinTree.AddNode(aNode) then aNode.Free;
end;
end;
{update label}
if assigned(fOnCrawlProgress) then fOnCrawlProgress(self,FPageNotYetDownloadedBinTree.nodeCount,FPageDownloadedBinTree.nodeCount, Url);
end;
{**************************************************************************}
procedure TAlTrivialWebSpider.WebSpiderCrawlDownloadSuccess(Sender: TObject;
const Url: AnsiString;
HTTPResponseHeader: TALHTTPResponseHeader;
HttpResponseContent: TStream;
var StopCrawling: Boolean);
Var aNode: TAlTrivialWebSpider_PageDownloadedBinTreeNode;
Str: AnsiString;
AFileName: AnsiString;
pMimeTypeFromData: LPWSTR;
begin
{put the content in str}
HttpResponseContent.Position := 0;
SetLength(Str, HttpResponseContent.size);
HttpResponseContent.ReadBuffer(pointer(Str)^,HttpResponseContent.Size);
{we add a check here to be sure that the file is an http file (text file}
{Some server send image with text/htm content type}
IF (FindMimeFromData(
nil, // bind context – can be nil
nil, // url – can be nil
PAnsiChar(str), // buffer with data to sniff – can be nil (pwzUrl must be valid)
length(str), // size of buffer
PWidechar(WideString(HTTPResponseHeader.ContentType)), // proposed mime if – can be nil
0, // will be defined
pMimeTypeFromData, // the suggested mime
0 // must be 0
) <> NOERROR) then pMimeTypeFromData := PWidechar(WideString(HTTPResponseHeader.ContentType));
{Get the FileName where to save the responseContent}
aFileName := GetNextLocalFileName(AnsiString(pMimeTypeFromData));
{If html then add <!– saved from ‘+ URL +’ –>’ at the top of the file}
if aFileName <> ” then begin
If ALSameText(AnsiString(pMimeTypeFromData),‘text/html’) then begin
Str := ‘<!– saved from ‘+ URL+‘ –>’ +#13#10 + Str;
AlSaveStringToFile(str,aFileName);
end
{Else Save the file without any change}
else TmemoryStream(HttpResponseContent).SaveToFile(String(aFileName));
end;
{delete the Url from the PageNotYetDownloadedBinTree}
FPageNotYetDownloadedBinTree.DeleteNode(Url);
{add the url to the PageDownloadedBinTree}
aNode:= TAlTrivialWebSpider_PageDownloadedBinTreeNode.Create;
aNode.ID := Url;
aNode.data := AlCopyStr(AFileName,length(fSaveDirectory) + 1,maxint);
If not FPageDownloadedBinTree.AddNode(aNode) then aNode.Free;
{update label}
if assigned(fLstUrlCrawled) then fLstUrlCrawled.add(Url);
if assigned(fOnCrawlProgress) then fOnCrawlProgress(self,FPageNotYetDownloadedBinTree.nodeCount,FPageDownloadedBinTree.nodeCount, Url);
end;
{*******************************************************************}
procedure TAlTrivialWebSpider.WebSpiderCrawlFindLink(Sender: TObject;
const HtmlTagString: AnsiString;
HtmlTagParams: TALStrings;
const URL: AnsiString);
Var aNode: TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode;
aURLWithoutAnchor: ansiString;
Lst: TALStringList;
I: integer;
Flag1 : Boolean;
S1: AnsiString;
begin
{If Check BoxDownload Image}
IF not fDownloadImage and
(
ALSameText(HtmlTagString,‘img’) or
(
ALSameText(HtmlTagString,‘input’) and
ALSameText(ALTrim(HtmlTagParams.Values[‘type’]),‘image’)
)
)
then Exit;
{Stay in start site}
If fStayInStartDomain and
(ALlowercase(AlExtractHostNameFromUrl(ALTrim(fStartUrl))) <> ALlowercase(AlExtractHostNameFromUrl(Url))) then exit;
{DeepLevel}
If (fMaxDeepLevel >= 0) and (FCurrentDeepLevel + 1 > fMaxDeepLevel) then exit;
{include link(s)}
If fIncludeMask <> ” then begin
Lst := TALStringList.Create;
Try
Lst.Text := ALTrim(AlStringReplace(FIncludeMask,‘;’,#13#10,[RfReplaceall]));
Flag1 := True;
For i := 0 to Lst.Count – 1 do begin
S1 := ALTrim(Lst[i]);
If S1 <> ” then begin
Flag1 := ALMatchesMask(URL, S1);
If Flag1 then Break;
end;
end;
If not flag1 then Exit;
Finally
Lst.Free;
end;
end;
{Exclude link(s)}
If fExcludeMask <> ” then begin
Lst := TALStringList.Create;
Try
Lst.Text := ALTrim(AlStringReplace(fExcludeMask,‘;’,#13#10,[RfReplaceall]));
Flag1 := False;
For i := 0 to Lst.Count – 1 do begin
S1 := ALTrim(Lst[i]);
If S1 <> ” then begin
Flag1 := ALMatchesMask(URL, S1);
If Flag1 then Break;
end;
end;
If flag1 then Exit;
Finally
Lst.Free;
end;
end;
{remove the anchor}
aURLWithoutAnchor := AlRemoveAnchorFromUrl(URL);
{call OnCrawlFindLink}
Flag1 := False;
if assigned(fOnCrawlFindLink) then fOnCrawlFindLink(Sender,
HtmlTagString,
HtmlTagParams,
aURLWithoutAnchor,
Flag1);
if Flag1 then exit;
{If the link not already downloaded then add it to the FPageNotYetDownloadedBinTree}
If FPageDownloadedBinTree.FindNode(aURLWithoutAnchor) = nil then begin
aNode:= TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode.Create;
aNode.ID := aURLWithoutAnchor;
aNode.DeepLevel := FCurrentDeepLevel + 1;
If not FPageNotYetDownloadedBinTree.AddNode(aNode) then aNode.Free;
end;
end;
{********************************************************************************************}
procedure TAlTrivialWebSpider.WebSpiderCrawlGetNextLink(Sender: TObject; var Url: AnsiString);
{————————————————————————————————}
function InternalfindNextUrlToDownload(aNode: TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode;
alowDeepLevel: Integer): TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode;
Var aTmpNode1, aTmpNode2: TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode;
Begin
If (not assigned(Anode)) or (aNode.DeepLevel <= alowDeepLevel) then result := aNode
else begin
if aNode.ChildNodes[true] <> nil then begin
aTmpNode1 := InternalfindNextUrlToDownload(TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode(aNode.ChildNodes[true]), alowDeepLevel);
If (assigned(aTmpNode1)) and (aTmpNode1.DeepLevel <= alowDeepLevel) then begin
result := aTmpNode1;
exit;
end;
end
else aTmpNode1 := nil;
if aNode.ChildNodes[false] <> nil then begin
aTmpNode2 := InternalfindNextUrlToDownload(TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode(aNode.ChildNodes[false]), alowDeepLevel);
If (assigned(aTmpNode2)) and (aTmpNode2.DeepLevel <= alowDeepLevel) then begin
result := aTmpNode2;
exit;
end;
end
else aTmpNode2 := nil;
result := aNode;
If assigned(aTmpNode1) and (result.deepLevel > aTmpNode1.deeplevel) then result := aTmpNode1;
If assigned(aTmpNode2) and (result.deepLevel > aTmpNode2.deeplevel) then result := aTmpNode2;
end;
end;
Var aNode: TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode;
begin
{If theire is more url to download}
IF FPageNotYetDownloadedBinTree.NodeCount > 0 then begin
{Find next url with deeplevel closer to FCurrentDeepLevel}
If fMaxDeepLevel >= 0 then aNode := InternalfindNextUrlToDownload(TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode(FPageNotYetDownloadedBinTree.head), FCurrentDeepLevel)
{Find next url without take care of FCurrentDeepLevel}
else aNode := TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode(FPageNotYetDownloadedBinTree.head);
Url := aNode.ID;
FCurrentDeepLevel := TAlTrivialWebSpider_PageNotYetDownloadedBinTreeNode(aNode).DeepLevel;
end
{If their is no more url to download then exit}
else begin
Url := ”;
FCurrentDeepLevel := –1;
end;
end;
{***********************************************************************************}
procedure TAlTrivialWebSpider.WebSpiderUpdateLinkToLocalPathFindLink(Sender: TObject;
const HtmlTagString: AnsiString;
HtmlTagParams: TALStrings;
const URL: AnsiString;
var LocalPath: AnsiString);
Var aNode: TALStringKeyAVLBinaryTreeNode;
aTmpUrl: ansiString;
aAnchorValue: AnsiString;
begin
LocalPath := ”;
If Url <> ” then begin
aTmpUrl := URL;
While True Do begin
aTmpUrl := AlRemoveAnchorFromUrl(aTmpUrl, aAnchorValue);
aNode := FPageDownloadedBinTree.FindNode(aTmpUrl);
If (aNode <> nil) then begin
LocalPath := TAlTrivialWebSpider_PageDownloadedBinTreeNode(aNode).Data;
If AlPos(‘=>’,LocalPath) = 1 then Begin
aTmpUrl := AlCopyStr(LocalPath,3,MaxInt);
LocalPath := ”;
end
else Break;
end
else Break;
end;
If LocalPath = ‘!’ then localpath := ”
else If LocalPath <> ” then begin
LocalPath := AlStringReplace(LocalPath,
‘\’,
‘/’,
[RfReplaceall]) + aAnchorValue;
If (FCurrentLocalFileNameIndex >= 0) then LocalPath := ‘../’ + LocalPath;
end;
end;
end;
{**************************************************************************************}
procedure TAlTrivialWebSpider.WebSpiderUpdateLinkToLocalPathGetNextFile(Sender: TObject;
var FileName, BaseHref: AnsiString);
{—————————————–}
Function SplitPathMakeFilename: AnsiString;
begin
If FCurrentLocalFileNameIndex < 0 then result := ”
else If FCurrentLocalFileNameIndex = 0 then result := fSaveDirectory + ‘Start.htm’
else Result := fSaveDirectory + ALIntToStr((FCurrentLocalFileNameIndex div SplitDirectoryAmount) * SplitDirectoryAmount + SplitDirectoryAmount) + ‘\’ + ALIntToStr(FCurrentLocalFileNameIndex) + ‘.htm’;
dec(FCurrentLocalFileNameIndex);
end;
Begin
if fSaveDirectory = ” then FileName := ”
else begin
{Find FileName}
FileName := SplitPathMakeFilename;
While (FileName <> ”) and not fileExists(string(FileName)) do
Filename := SplitPathMakeFilename;
end;
{if filename found}
If FileName <> ” then Begin
{Extract the Base Href}
BaseHref := AlGetStringFromFile(FileName);
BaseHref := ALTrim(AlCopyStr(BaseHref,
17, // ‘<!– saved from ‘ + URL
AlPos(#13,BaseHref) – 21)); // URL + ‘ –>’ +#13#10
{update label}
if assigned(fOnUpdateLinkToLocalPathProgress) then fOnUpdateLinkToLocalPathProgress(self, FileName);
end
else BaseHref := ”;
end;
end.

WordPress Schulung

Was ist WordPress

Mit WordPress sind zwei Sachen gemeint

Ein auf PHP basierendes Framework, mit dem man rasch einen eigenen Blog oder generell einen Website mit Blogstruktur erstellen kann
Die Plattform wordpress.com, auf der man gratis oder gegen Bezahlung einen mit dem Framework erstellten Website hosten kann

Little Monsters

Kleine Insekten ganz gross!

Gesammelt und fotografiert von Silvia Rothen in Bern und Klagenfurt.

maXbox Rheingold Edition

*
Release maXbox 4.7.1.82 Dez 2019
*
add Tutorials 57 – 72
NoGUI Shell Tutorial 71 and 46 Units

1307 unit uPSI_statmach, {StateMachine}
1308 uPSI_uTPLb_RSA_Primitives,
1309 unit uPSI_UMatrix, //for Tensorflow dll
1310 uPSI_DXUtil,
1311 uPSI_crlfParser,
1312 unit uPSI_DCPbase64;
1313 unit uPSI_FlyFilesUtils;
1314 uPSI_PJConsoleApp.pas
1315 uPSI_PJStreamWrapper.pas
1316 uPSI_LatLonDist, //DFF
1317 uPSI_cHash2.pas //Fundamentals SHA512
1318 uPSI_ZLib2.pas //compressor
1319 unit uPSI_commDriver
1320 unit uPSI_PXLNetComs.pas //PXL
1321 unit uPSI_PXLTiming.pas //PXL
1322 uPSI_Odometer.pas
1323 unit uPSI_UIntList2;
1324 uPSI_UIntegerpartition.pas
1325 unit uPSI_idPHPRunner.pas //prepare for PHP4D
1326 unit uPSI_idCGIRunner.pas
1327 uPSI_DrBobCGI, //4.7.1.20
1228 uPSI_OverbyteIcsLogger,
1229 uPSI_OverbyteIcsNntpCli, testset
1230 uPSI_OverbyteIcsCharsetUtils,
1231 uPSI_OverbyteIcsMimeUtils,
1232 uPSI_OverbyteIcsUrl(CL: TPSPascalCompiler);
1233 uPSI_uWebSocket.pas
1234 uPSI_KhFunction.pas
1235 uPSI_ALOpenOffice.pas
1236 unit uPSI_ALLibPhoneNumber
1237 unit uPSI_ALPhpRunner2;
1238 unit uPSI_ALWebSpider2;
1239 unit uPSI_ALFcnHTML2; // RunJavaScript2
1240 unit uPSI_ALExecute2.pas
1241 uPSI_ALIsapiHTTP.pas
1242 uPSI_ALOpenOffice_Routines
1243 unit uPSI_uUsb;
1244 uPSI_uWebcam.pas
1245 uPSI_PersistSettings.pas //fixing & refactoring
1246 uPSI_uTPLb_MemoryStreamPool.pas
1247 uPSI_uTPLb_Signatory.pas
1248 unit uPSI_uTPLb_Constants.pas //TurboPower
1249 uPSI_uTPLb_Random.pas
1250 unit uPSI_uTPLb_PointerArithmetic;
1251 unit uPSI_EwbCoreTools.pas
1252 unit uPSI_EwbUrl.pas
1253 unit uPSI_SendMail_For_Ewb.pas

TidCGIRunner component allows to execute CGI scripts using
Indy TidHTTPServer component.

// Project: 1307 State Machine
// Module: statmach.pas
// Description: Visual Finite State Machine.
// Version: 2.2a, Release: 6 preprocessor, compiler, interpreter
function VarArrayToStr(const vArray: variant): string;
function GetWMIObject(const objectName: String): IDispatch; //create WMI instance
function GetAntiVirusProductInfo: TStringlist;

Totals of Function Calls: 32147
SHA1: of 4.7.1.82 431DACFCEB0739DC3A6844BB3DBE0084DE90F1F1
CRC32: E2D13D3F 27.6 MB (28,996,000 bytes)
maxbox4.zip sha1: 46e51c65b560986346f27314cd303f79a7ff05d5

****************************************************************
Release Notes maXbox 4.7.4.64 June 2020 mX47
****************************************************************

1254 unit uPSI_MaskEdit.pas FCL
1255 unit uPSI_SimpleRSSTypes; BlueHippo
1256 unit uPSI_SimpleRSS; BlueHippo
1257 unit uPSI_psULib.pas Prometheus
1258 unit uPSI_psUFinancial; Prometheus
1259 uPSI_PsAPI_2.pas mX4
1260 uPSI_PersistSettings_2 mX4
1261 uPSI_rfc1213util2.pas IP
1262 uPSI_JTools.pas JCL
1263 unit uPSI_neuralbit.pas CAI
1264 unit uPSI_neuralab.pas CAI
1265 unit uPSI_winsvc2.pas TEK
1266 unit uPSI_wmiserv2.pas TEK
1267 uPSI_neuralcache.pas CAI
1268 uPSI_neuralbyteprediction CAI
1269 unit uPSI_USolarSystem; glscene.org
1270 uPSI_USearchAnagrams.pas DFF
1271 uPSI_JsonsUtilsEx.pas Randolph
1272 unit uPSI_Jsons.pas Randolph
1273 unit uPSI_HashUnit; DFF
1274 uPSI_U_Invertedtext.pas DFF
1275 unit uPSI_Bricks; Dendron
1276 unit uPSI_lifeblocks.pas Dendron

Totals of Function Calls: 32633
SHA1: of 4.7.4.64 DA4C716E31E2A4298013DFFBDA7A98D48650B0C7
CRC32: 3EB27A87: 28.2 MB (29,608,248) bytes

Rheingold 2009

In enger Zusammenarbeit mit der Zietz Eisenbahntouristik organisierte TEE-CLASSICS am Samstag 18. April 2009, also vor gut 10 Jahren, die offizielle Premierenfahrt der erneut in TEE-Farben glänzenden SBB Re 4/4 I 10034 zum TEE Rheingold-Bavaria. Dieses Ereigniss begann mit der aus Frankfurt kommenden DB 103 184-8 für uns in Basel. Text und Poster: Max Kleiner, Fotos: Silvia Rothen

2009049901010_18042009_Schweiz_TEE-Fahrt Lindau_Motiv (7334)_lok

2009049901350_18042009_Schweiz_TEE-Fahrt Lindau_Motiv (7369)_loknight

Die Re 4/4 I übernahm dann in Basel die TEE-Wagen der BR 103, welche von Frankfurt mit der legendären BR 103 gekoppelt wurden. Die Reise führte von Basel Bad über Zürich, Winterthur und St. Margrethen nach Lindau am Bodensee, für mich ein toller Kopfbahnhof. Der Zug führte historische TEE-Wagen des DB-Museums (Bauart RHEINGOLD), zu denen auch ein Speisewagen WRüm und ein Aussichtswagen (Domecar) ADüm gehören.

2009049901210_18042009_Schweiz_TEE-Fahrt Lindau_Motiv (7354)_loklive

2009049901090_18042009_Schweiz_TEE-Fahrt Lindau_Motiv (7342)_dome

2009049901020_18042009_Schweiz_TEE-Fahrt Lindau_Motiv (7335)_markus

Auf dem Rückweg führte die Reise ab Rorschach mit Fotohalt über die Seelinie zurück nach Winterthur, von wo man die blau-bewölkte Fahrt via Bülach, Koblenz dem Rhein entlang an den Ausgangsort Basel fortsetzte.

2009049900980_18042009_Schweiz_TEE-Fahrt Lindau_Motiv (7331)_friederike

Die Re 4/4 I 10034 ist ja auf der Strecke Zürich – Lindau keine Unbekannte Dienstlok, denn von 1972 bis 1977 beförderte sie mit dem entsprechenden Baujahr 1946 oftmals den TEE-BAVARIA in die Stadt im Dreiländereck.

Poster aus dem Jahre 2009 mit Dom- und Speisewagen.

2009049901450_26042009_Deutschland_Friedrichshafen_Friedhof (7376)_max

2009049901200_18042009_Schweiz_TEE-Fahrt Lindau_Motiv (7353)_driver

Die ehemalige SBB-Lok Re 4/4 I 10034 diente zuletzt in Basel als Waschlok und gehörte einer Privatperson. Sie wird nun nach dem Kauf durch TEE-CLASSICS revidiert und mit etlichen Sonderfahrten und Einsätzen betrieben. TEE-CLASSICS wird europaweit als erste Adresse und Institution für die Erhaltung des TEE-Kulturgutes wahrgenommen.

http://www.tee-classics.ch/

Weather Forecast API

We are glad to announce our new API version for managing your personal weather stations – Weather Stations API 3.0 (beta). It went through some essential changes in comparison with the previous version.

Access current weather data for any location on Earth including over 200,000 cities! Current weather is frequently updated based on global models and data from more than 40,000 weather stations. Data is available in JSON, XML, or HTML format.

http://www.softwareschule.ch/examples/weatherapp47.txt

The main improvement is that users now have more flexible control of their stations and allowed to share and transfer data related to them. New API in RESTful style makes this process easy and convenient. In the previous version user’s account was limited to single station, but with our new version users are allowed to add as many stations as they need.

API call:

api.openweathermap.org/data/2.5/weather?q={city name}

api.openweathermap.org/data/2.5/weather?q={city name},{country code}

Parameters:

q city name and country code divided by comma, use ISO 3166 country codes

Examples of API calls:

api.openweathermap.org/data/2.5/weather?q=London

api.openweathermap.org/data/2.5/weather?q=London,uk

http://api.openweathermap.org/data/2.5/forecast?q=bern&units=metric&APPID=key

{"cod":"200","message":0,"cnt":40,"list":[{"dt":1572814800,"main":{"temp":7.9
4,"temp_min":6.7,"temp_max":7.94,"pressure":996,"sea_level":996,"grnd_level":926,
"humidity":91,"temp_kf":1.24},"weather":[{"id":500,"main":
"Rain","description":"light rain","icon":"10n"}],"clouds":{"all":93},
"wind":{"speed":5.25,"deg":228},"rain":{"3h":1},"sys":{"pod":"n"},
"dt_txt":"2019-11-03 21:00:00"},{"dt":1572825600,"main":{"temp":7.68,
"temp_min":6.75,"temp_max":7.68,"pressure":997,"sea_level":997,"grnd_level":927,
"humidity":93,"temp_kf":0.93},"weather":[{"id":500,"main":"Rain","description":
"light rain","icon":"10n"}],"clouds":{"all":96},"wind":{"speed":3.01,"deg":200},
"rain":{"3h":2.38},"sys":{"pod":"n"},"dt_txt":"2019-11-04 00:00:00"},{"dt":1572836400,"main":

By city ID

Description:

You can call by city ID. API responds with exact result.

List of city ID city.list.json.gz can be downloaded here http://bulk.openweathermap.org/sample/

We recommend to call API by city ID to get unambiguous result for your city.

Parameters:

id City ID

Examples of API calls:

api.openweathermap.org/data/2.5/weather?id=2172797

api.openweathermap.org/data/2.5/weather?q=London

api.openweathermap.org/data/2.5/weather?q=London,uk

http://api.openweathermap.org/data/2.5/forecast?q=bern&units=metric&APPID=key

A Lazarus package is a collection of units and components, containing information how they can be compiled and how they can be used by projects or other packages or the IDE. The IDE automatically compiles packages if some of its files are modified. In contrast to Delphi, packages are not limited to libraries and they can be OS independent. (Library Packages are specially compiled libraries used by applications, the IDE or both. Delphi/Library packages require in-compiler support, which FPC is not capable of at the moment and of course this magic is OS dependent.)

Recently I started playing around with BMP180 Temperature and Humidity sensor, at that time around, I had an idea to make a small yet effective Weather Station, so in this project, we will use the BMP180 sensor from the previous tutorial and turn it into mini weather station for our desk and your browser with the help of OpenWeatherMap:

For the Mac we run the desk weather station on wine:

So this is a test for a picture competition. The simple way is to select the image which you know the TEE train-name for the most (for example TEE Merkur Pic 1 or TEE Edelweiss Pic 4) and maybe the model name too (for example LIMA Pic 1 and Märklin Pic 4) :

I think we believe that most of use are fundamentally guided by self-interest, but patterns show more and more a collective behaviour.

Web (runtime) applications are more and more abundant. You can program the web using FPC’s pas2js.
With pas2js it is possible to develop VS Code, Atom or Electron applications based on V8.
TMS Software has unveiled a Object Pascal RAD IDE for VS Code plugin, based on pas2js and TMS Web Core or maXbox Web.
This raises the question whether a dedicated Object Pascal IDE is still a necessity.

Web (runtime) applications are abundant. You can program the web using FPC’s pas2js.
With pas2js it is possible to develop VS Code, Atom or Electron applications.
TMS Software has unveiled a Object Pascal RAD IDE for VS Code plugin, based on pas2js and TMS Web Core. This raises the question whether a dedicated Object Pascal IDE is still a necessity.

Was ist WordPress

*********************************** Release maXbox 4.7.1.82 Dez 2019 *********************************** add Tutorials 57 – 72 NoGUI Shell Tutorial 71 and 46 Units

API call:

Parameters:

Examples of API calls:

By city ID

Description:

Parameters:

Examples of API calls:

*
Release maXbox 4.7.1.82 Dez 2019
*
add Tutorials 57 – 72
NoGUI Shell Tutorial 71 and 46 Units