IPB

Welcome Guest ( Log In | Register )

18 Pages V  « < 14 15 16 17 18 >  
Reply to this topicStart new topic
> Multiple Source Xmltv Downloader., SafeXMLTV
Ratall
post Nov 21 2010, 02:30 AM
Post #301


Participant


Group: New Members
Posts: 24
Joined: 24-October 09
From: Australia
Member No.: 12,668
Card: DNTV TinyTwin USB


QUOTE (Calvi @ Nov 13 2010, 11:53 AM) *
Version 10.0 Released (Finally!!!!!!!!!!!)

10.0 (complete package)

*<repeatexclude> now uses regular expressions.



Hi
I am not 100% certain but I believe the logic has become reversed on thr repeat excude option

CODE
If Not oRepeatSortedList.Contains(repeat_index) Then
If VBSRegExpMatch(oConfigDictionary.Item("repeatexclude"),_
repeat_subtitle,True,True) Then
oRepeatSortedList.Add repeat_index, oRepeatSortedList.Count
End If
End If



should read

CODE
If Not oRepeatSortedList.Contains(repeat_index) Then
If not VBSRegExpMatch(oConfigDictionary.Item("repeatexclude"),_
repeat_subtitle,True,True) Then
oRepeatSortedList.Add repeat_index, oRepeatSortedList.Count
End If
End If


My recorded items are not being added to the repeatlist
and I think the above is the cause (my experience of VBS is limited so I may be mistaken.)
I will be testing tommorrow if I get time and will let you know if the change fixes it.

thanks again for the hard work

Rick
Go to the top of the page
 
+Quote Post
Calvi
post Nov 21 2010, 08:42 AM
Post #302


Forum Regular


Group: Members
Posts: 829
Joined: 8-November 04
Member No.: 1,988
Card: DVICO FusionHDTV


Thanks Rick,

100% correct!

Code was...

CODE
If InStr(oConfigDictionary.Item("repeatexclude"),repeat_subtitle) = 0 Then


Changed to

CODE
          If VBSRegExpMatch(oConfigDictionary.Item("repeatexclude"),_
                  repeat_subtitle,True,True) Then

Should be...

CODE
          If Not VBSRegExpMatch(oConfigDictionary.Item("repeatexclude"),_
                  repeat_subtitle,True,True) Then


Fixed in 10.1 Released Now.

Thanks again for spotting that, glad to see someone is paying attention. (Can't believe I didn't notice the repeat list count had stalled!).

This post has been edited by Calvi: Nov 21 2010, 08:43 AM
Go to the top of the page
 
+Quote Post
Ratall
post Nov 21 2010, 09:49 PM
Post #303


Participant


Group: New Members
Posts: 24
Joined: 24-October 09
From: Australia
Member No.: 12,668
Card: DNTV TinyTwin USB


Calvi
Thanks for your response.
I would not have noticed either but I have to apply a slight mod to your code to make the repeat function usefull for me.
As a part of my procedure for applying it to a new version I follow a check list which highlighted the problem.

Rick


Go to the top of the page
 
+Quote Post
Calvi
post Nov 22 2010, 08:38 AM
Post #304


Forum Regular


Group: Members
Posts: 829
Joined: 8-November 04
Member No.: 1,988
Card: DVICO FusionHDTV


Rick,
whats the slight mod? Is it the xml tag that is set?
Go to the top of the page
 
+Quote Post
Ratall
post Nov 22 2010, 09:31 AM
Post #305


Participant


Group: New Members
Posts: 24
Joined: 24-October 09
From: Australia
Member No.: 12,668
Card: DNTV TinyTwin USB


QUOTE (Calvi @ Nov 22 2010, 09:38 AM) *
Rick,
whats the slight mod? Is it the xml tag that is set?



Calvi

I use DV scheduler with Icetv and others which presents a couple of problems

1. Ice-Tv has its own <previously-shown> entries a problem as I am only interested in stuff I've already seen
2. Dv Sheduler knows about repeats but not <previously-shown>?

so I added a catagory and then found that DV scheduler only picks up the first catagory in a program (I have contacted the programmer and he is looking into that)
It should be noted that ICETV provides multiple catagories on individual programmes anyway.
this is my change in bold

CODE

If oRepeatSortedList.Contains(repeat_index) Then
xml_data = Replace(xml_data,"</programme>",_
"<previously-shown></previously-shown>" & _
vblf & "</programme>")
'RATFIX
xml_data = Replace(xml_data, "<category" , "<category>previously-captured</category><category")


Repeats = Repeats + 1
LogLine "EPG Repeat Item..." & repeat_index,True,4,False,"black"
End If


I have also preprocessed the cature.xml's as sometimes the wronge program would be picked up.
The preprocessing removes any epg_item where the epg_title does not match the ws_name.
I have attached a copy of an unprocesed capture.xml for your amusement.

Useing a catagory also means I can search as well as schedule programmes.
Once the problem with DV scheduler is fix this will be even more useful.

Anyway for better or worst thats what I did. Bet you wish you never asked.

Rick
Attached File(s)
 
Go to the top of the page
 
+Quote Post
Calvi
post Nov 27 2010, 11:15 AM
Post #306


Forum Regular


Group: Members
Posts: 829
Joined: 8-November 04
Member No.: 1,988
Card: DVICO FusionHDTV


QUOTE
Bet you wish you never asked


No, I thought it might be the XML tag so was thinking of adding as a configuration option.

The special requirements, eg first category etc should be fixed in dv scheduler though rather than worked around.

I'll look into it further when I get some time.

JC.
Go to the top of the page
 
+Quote Post
Ratall
post Nov 27 2010, 10:09 PM
Post #307


Participant


Group: New Members
Posts: 24
Joined: 24-October 09
From: Australia
Member No.: 12,668
Card: DNTV TinyTwin USB


QUOTE (Calvi @ Nov 27 2010, 12:15 PM) *
No, I thought it might be the XML tag so was thinking of adding as a configuration option.

The special requirements, eg first category etc should be fixed in dv scheduler though rather than worked around.

I'll look into it further when I get some time.

JC.


I agree the 1st catagory thing is a a problem that should be resolved in DV scheduler.( I only mentioned it to explain the way I added the category entry)

Though being able to use a category to indicate that a program has been previously captured has some abvantages
being able to set the display colour in the epg differently and using it in searches for example.

Question what is the size limit on the repeatlist, would it be possible to add an optional timeout on the repeat list entries so they can be ignored or removed after a user defined period ?

I was also thinking about catagories and the way different sources use different names for the same catagagories for example

Movie vs Movies
Comedy vs Comedies
Amination vs Amine vs Cartoon etc

(the last one will offend the purists)

I would be nice if there could be a conversion done via a list of excepted catagories and their unacceptable equivilants

so the program replaces the unexceptable with the exceptable.

eg
CODE
<category>Comedies</category> replaced with <category>Comedy</category>

<category lang="en">Anime</category> replaced with <category lang="en">Animation</category>

<category>Cartoon</category> replaced with <category>Animation</category>


possably using an entry in the config such as

CODE
<categoryfix>
Comedy::Comedies;;
Animation::Anime;;
Animation::Cartoon
</categoryfix>


Its just an idea
If it's possible I think it makes sense.

Again thanks for all your hard work you software is great and really needed.

Rick
Go to the top of the page
 
+Quote Post
Calvi
post Jan 28 2011, 08:43 AM
Post #308


Forum Regular


Group: Members
Posts: 829
Joined: 8-November 04
Member No.: 1,988
Card: DVICO FusionHDTV


Version 10.2 Released - Major Upgrade!


(Rick, have taken your suggestions on board, taken 2 months but better late than never).
PS. TVSP has the same bug/feature of only accepting the first category.


CODE
10.2 (complete package)
  *Added Channel Eleven (Melbourne Only)
  *Changed <filter> to Pre Replace Feature
    <prereplace> = a list of items to be replaced for data consistency or other needs BEFORE each source is looked at.
               this is a regex expression containing characters to be filtered out. eg.
               [\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\xa0] aggressive filter
               [\x00-\x08\x0b\x0c\x0e-\x1f\x7f]      less aggressive
               Also can strip /replace words etc. Eg. movie: for IceTV.
               leave blank to apply no filtering.
                Note: This filter is run per source prior to matchlist, merging checks etc.
                Eg. The following example removes invalid characters and Movie:, (New) and -Series Premiere from the title.
                  <![CDATA[
                  [\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\xa0]::;;
                  <title.*>.*?movie.?: ?(.*)?<(?i)::<title>$1<;;
                  <title.*>(.*) ?\(new.*\).*<(?i)::<title>$1<;;
                  <title.*>(.*) - series premiere<(?i)::<title>$1<
                  ]]>        
  *Added Post Replace Feature
    <postreplace> = a list of items to be replaced for data consistency or other needs AFTER guide is compiled.
                The list comprises delimited strings ("::") separated by a (";;") with the following information.
                Search Pattern::Replacement Data
                Search Pattern   - A regex expression to search for within each programme.
                Replacement Data - The data to replace the search pattern with.
                  Note: Use Cdata to escape xml tags, otherwise escape as per xml eg, < = &lt; etc.
                  Eg. The following example replaces Categories, Comedies with Comedy and Anime or Cartoon with Animation.
                    <![CDATA[
                    <category.*>Comedies<(?i)::<category>Comedy<;;
                    <category.*>(Anime|Cartoon)<(?i)::<category>Animation<
                    ]]>
                  Note: An incorrect regex or replace term can make the xml unreadable.
                        SafeXMLTV will check each element for validity before adding to the final guide,
                        but if corrupted it will be removed from the epg thus programmes will be LOST!
                        Corrupted programmes are logged.  
  *Added Repeat Tag Config Item.
    <repeattag> = The configuration for setting a programme as a repeat.
                  This item consists of a delimited string ("::") with the following information.
                  Insertion Point::Repeat Tag Data
                  Insertion Point - The text to search for within each programme to place the repeat data.
                  Repeat Tag Data - The data to insert for a repeat programme.
                    Note: Use Cdata to escape xml tags, otherwise escape as per xml eg, < = &lt; etc.
                    Default: The default inserts before the </programme> tag (ie last) with <previously-shown></previously-shown>
                      <<![CDATA[
                      </programme>::<previously-shown></previously-shown></programme>
                      ]]>
                    Eg. An Alternate Example sets the first Category instead (the trailing > is omitted to cater for lang="xx" attributes)...
                      <<![CDATA[
                      <category::<category>previously-captured</category>
                      ]]>
  *Added per programme element xml checking for validity before writing the final xml file.
     This mainly to address the possibility of damaging the xml file with regex replacements                      
  *Added a setting to enable/disable credits appended to the description.
    <dcredits> = Whether to append credits to the description tag.
                 Note: Credits already in the description are omitted and those missing are added with Additional Credits:
                       If no credits exist in the description then Credits: is appended to the end of the description.
                       If dcredits is set to false then no credits are ever added to the description.
  *Refactored a lot of code for performance improvements (~100% faster).                      
  *Other small improvements.


This post has been edited by Calvi: Jan 28 2011, 08:44 AM
Go to the top of the page
 
+Quote Post
Ratall
post Jan 28 2011, 06:35 PM
Post #309


Participant


Group: New Members
Posts: 24
Joined: 24-October 09
From: Australia
Member No.: 12,668
Card: DNTV TinyTwin USB


Calvi

Thanks for taking my suggestion on board.


I hope to test it out in a couple of days.

Thanks again for your hard work

Rick

PS You might want to update the General notes section in the config to show how these changes fit in it might save some potental confusion.
Go to the top of the page
 
+Quote Post
Ratall
post Jan 28 2011, 09:03 PM
Post #310


Participant


Group: New Members
Posts: 24
Joined: 24-October 09
From: Australia
Member No.: 12,668
Card: DNTV TinyTwin USB


Calvi
just ran my first test and got the following error

CODE
Script: C:\XMLTV\xmltv.vbs
Line:755
Char: 5
Error: Object doesn't support this property or method: 'program_element.TagName'
Code: 800A0186
Source: Microsoft VBScript runtime error


when processing DVBScan data

If you change the following in the config
CODE
<prereplace>
<![CDATA[
[\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\xa0]::;;



to

CODE
<prereplace>
<![CDATA[
<!--.*-->::;;
[\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\xa0]::;;


it resolves the problem by removing some data that the program has problems with


Rick
Go to the top of the page
 
+Quote Post
Ratall
post Jan 29 2011, 05:44 AM
Post #311


Participant


Group: New Members
Posts: 24
Joined: 24-October 09
From: Australia
Member No.: 12,668
Card: DNTV TinyTwin USB


Calvi
version 10.2 does not handle comment blocks in the form of
CODE
<!--
some comment
-->


version 10.1 had no problem.

Both DVBScan and EPGstream use this type of comment.

in the previous posting I provided a quick fix for DVBScan (this works because DVBscan restricts the comments to 1 line each)

but EPGStream has comments over multiple lines and this appears to hang the vbs.

apart from this it seems to work great.

great work

Rick



Go to the top of the page
 
+Quote Post
Calvi
post Jan 29 2011, 10:13 AM
Post #312


Forum Regular


Group: Members
Posts: 829
Joined: 8-November 04
Member No.: 1,988
Card: DVICO FusionHDTV


Rick,

No problem,

will sort out soon.

Thanks for the testing.
Go to the top of the page
 
+Quote Post
Calvi
post Jan 30 2011, 07:40 AM
Post #313


Forum Regular


Group: Members
Posts: 829
Joined: 8-November 04
Member No.: 1,988
Card: DVICO FusionHDTV


Version 10.3 Released.

10.3 (xmltv.vbs only)
*Fixed bug in handling xml files with text comments embedded in programmes.
Go to the top of the page
 
+Quote Post
Ratall
post Jan 30 2011, 07:37 PM
Post #314


Participant


Group: New Members
Posts: 24
Joined: 24-October 09
From: Australia
Member No.: 12,668
Card: DNTV TinyTwin USB


Calvi,

Just been testing 10.3 Embeded text problom fixed but I think I found another error I have suspected this problem for a while but could not nail it down good enought to report it.

Basicly the program sometimes picks up data from the wronge source.

here goes

this is the sources entry from my test config

CODE
<sources>

Icetv::300::1::+All::Subtitle::Full::+0000::NC;;
epgstream::300::1::+All::Empty::Full::+0000::NC

</sources>



Here is the Icetv entry for the programme I spotted

CODE
<programme start="20110130133000 +0000" stop="20110130173000 +0000" channel="2397" clumpidx="0/1">
<title lang="en">Tennis: Australian Open</title>
<sub-title lang="en">Classic Match - 1991 Semi-final - Edberg v Lendl</sub-title>
<desc lang="en">The 1991 Australian Open was taken by Boris Becker, but the Semi-Final stoush between Lendl and Edberg was one of the best matches of the tournament.</desc>
<category lang="en">Sport</category>
<category lang="en">Tennis</category>
<episode-num system="icetv">18354-98754</episode-num>
<previously-shown start="20101026" />
</programme>


this is the icetv mapfile
CODE
<?xml version="1.0" encoding="ISO-8859-1"?>

<!--
SafeXMLTV Channel Map Settings.
by John Calvi, 2006

Specify each channel required as an xml element.

<name> Denotes the channel name as it should appear in the final guide.
These would be the channels as mapped in Webscheduler etc.
<source> The source tag denotes the name as it appears in the source xml file.
<offset> The offset denotes the offset to apply per channel in timezone format.
i.e. ±hhmm (hh = hours 0-23, mm = minutes 00-59).

-->

<channelmap>

<ABC>
<name>ABC1</name>
<source>2</source>
<offset>+0000</offset>
</ABC>

<ABC2>
<name>ABC2</name>
<source>38</source>
<offset>+0000</offset>
</ABC2>

<ABC3>
<name>ABC3</name>
<source>2410</source>
<offset>+0000</offset>
</ABC3>

<ABC-News>
<name>ABC News 24</name>
<source>100</source>
<offset>+0000</offset>
</ABC-News>

<Seven>
<name>7 Digital</name>
<source>3</source>
<offset>+0000</offset>
</Seven>

<SevenTwo>
<name>7TWO</name>
<source>2397</source>
<offset>+0000</offset>
</SevenTwo>

<SevenHD>
<name>7mate</name>
<source>103</source>
<offset>+0000</offset>
</SevenHD>

<Nine>
<name>Nine Digital</name>
<source>4</source>
<offset>+0000</offset>
</Nine>

<NineGo>
<name>GO</name>
<source>2393</source>
<offset>+0000</offset>
</NineGo>

<NineHD>
<name>GEM</name>
<source>101</source>
<offset>+0000</offset>
</NineHD>

<TEN>
<name>Ten Digital</name>
<source>5</source>
<offset>+0000</offset>
</TEN>

<ELEVEN>
<name>ELEVEN</name>
<source>1700</source>
<offset>+0000</offset>
</ELEVEN>

<OneHD>
<name>One HD</name>
<source>104</source>
<offset>+0000</offset>
</OneHD>

<SBS1>
<name>SBS ONE</name>
<source>1</source>
<offset>+0000</offset>
</SBS1>

<SBSHD>
<name>SBS HD</name>
<source>102</source>
<offset>+0000</offset>
</SBSHD>

<SBS2>
<name>SBS TWO</name>
<source>29</source>
<offset>+0000</offset>
</SBS2>

<TVS>
<name>TVS</name>
<source>2453</source>
<offset>+0000</offset>
</TVS>
</channelmap>



Here is the EPGStream entry for the programme

CODE
<programme start="20110131004500 +1100" stop="20110131050000 +1100" channel="7TWO-NSW">
<title>Australian Open Tennis Classic</title>
<sub-title>1991 Semi-Final - Edberg V Lendl</sub-title>
<desc>The community TV guide system has no description for this program. You can help by adding the missing information: see http://www.oztivo.net/twiki/bin/view/TVGui...s.</desc>
<star-rating>
<value>0/10</value>
</star-rating>
</programme>


this is the epgstream mapfile

CODE
<?xml version="1.0" encoding="ISO-8859-1"?>

<!--
SafeXMLTV Channel Map Settings.
by John Calvi, 2006

Specify each channel required as an xml element.

<name> Denotes the channel name as it should appear in the final guide.
These would be the channels as mapped in Webscheduler etc.
<source> The source tag denotes the name as it appears in the source xml file.
<offset> The offset denotes the offset to apply per channel in timezone format.
i.e. ±hhmm (hh = hours 0-23, mm = minutes 00-59).

-->

<channelmap>

<ABC>
<name>ABC1</name>
<source>ABC-NSW</source>
<offset>+0000</offset>
</ABC>
<ABC2>
<name>ABC2</name>
<source>ABC2</source>
<offset>+0000</offset>
</ABC2>
<ABC3>
<name>ABC3</name>
<source>ABC3</source>
<offset>+0000</offset>
</ABC3>
<ABC-News>
<name>ABC News 24</name>
<source>ABC-News24</source>
<offset>+0000</offset>
</ABC-News>
<Seven>
<name>7 Digital</name>
<source>Seven-Syd</source>
<offset>+0000</offset>
</Seven>
<SevenTwo>
<name>7TWO</name>
<source>7TWO-NSW</source>
<offset>+0000</offset>
</SevenTwo>
<SevenHD>
<name>7mate</name>
<source>7mate</source>
<offset>+0000</offset>
</SevenHD>
<Nine>
<name>Nine Digital</name>
<source>Nine-Syd</source>
<offset>+0000</offset>
</Nine>
<NineGo>
<name>GO</name>
<source>GO</source>
<offset>+0000</offset>
</NineGo>
<NineHD>
<name>GEM</name>
<source>GEM</source>
<offset>+0000</offset>
</NineHD>
<TEN>
<name>Ten Digital</name>
<source>Ten-NSW</source>
<offset>+0000</offset>
</TEN>
<OneDigital>
<name>One Digital</name>
<source>One-NSW</source>
<offset>+0000</offset>
</OneDigital>
<ELEVEN>
<name>ELEVEN</name>
<source>ELEVEN</source>
<offset>+0000</offset>
</ELEVEN>
<SBS1>
<name>SBS ONE</name>
<source>SBS-NSW</source>
<offset>+0000</offset>
</SBS1>
<SBS2>
<name>SBS TWO</name>
<source>SBSTWO-NSW</source>
<offset>+0000</offset>
</SBS2>
<TVS>
<name>TVS</name>
<source>TVS</source>
<offset>+0000</offset>
</TVS>
</channelmap>



Here is the output entry for the programme

CODE
<programme start="20110131004500 +1100" stop="20110131050000 +1100" channel="7TWO">
<title>Australian Open Tennis Classic</title>
<sub-title>1991 Semi-Final - Edberg V Lendl</sub-title>
<desc>The community TV guide system has no description for this program. You can help by adding the missing information: see http://www.oztivo.net/twiki/bin/view/TVGui...s.</desc>
<epg_source>c:\xmltv\location\sydney\epgstream\xmltv1.xml</epg_source>
</programme>



I believe that it should have grabbed the data from icetv not EPGStream.

Sorry I didn't report the problem earlier but I had problems finding evidence to support my suspicion.
I think this may have be happen since before 10.0 but I'm not 100% on that.

I have preserved the full xmltv inputs and output plus config and mapfles I used for the test if you want then

sorry for the length of this post but sometimes I have a little problem explaining things.



Rick
Go to the top of the page
 
+Quote Post
Calvi
post Jan 31 2011, 11:14 AM
Post #315


Forum Regular


Group: Members
Posts: 829
Joined: 8-November 04
Member No.: 1,988
Card: DVICO FusionHDTV


Rick,

what's happening there is as follows...

1. The icetv one is added to timeslot 1330
2. The epgstream one is set to only add if empty, but its timeslot is 1345 so the slot is deemed empty.
3. The overlap detection then determines they overlap ie.
ice : 1330 to 1730 (4 hrs)
epgstream : 1345 to 1800 (4 hours 15mins)
4. The overlap detection decides to keep the second one (epgstream) as it is longer and terminate the first one (icetv).

In order to prevent this the overlap detection would need to be made smarter, ie have the option to terminate by priority rather than the longer schedule for example.

(Not that this is necessarily smarter as the longer schedule is usually the safest option).

This post has been edited by Calvi: Jan 31 2011, 11:16 AM
Go to the top of the page
 
+Quote Post
Ratall
post Jan 31 2011, 03:29 PM
Post #316


Participant


Group: New Members
Posts: 24
Joined: 24-October 09
From: Australia
Member No.: 12,668
Card: DNTV TinyTwin USB


QUOTE (Calvi @ Jan 31 2011, 12:14 PM) *
Rick,

what's happening there is as follows...

1. The icetv one is added to timeslot 1330
2. The epgstream one is set to only add if empty, but its timeslot is 1345 so the slot is deemed empty.
3. The overlap detection then determines they overlap ie.
ice : 1330 to 1730 (4 hrs)
epgstream : 1345 to 1800 (4 hours 15mins)
4. The overlap detection decides to keep the second one (epgstream) as it is longer and terminate the first one (icetv).

In order to prevent this the overlap detection would need to be made smarter, ie have the option to terminate by priority rather than the longer schedule for example.

(Not that this is necessarily smarter as the longer schedule is usually the safest option).


Calvi,
I see what your saying as reguards the way the program runs.

But to my way of thinking the Icetv entry effectively occupies all slots from 1330 upto 1730 and I expected empty to work this way.

If you specify the empty method for a source(in this case epgstream); If any part of an entry from that source conficts with a non-empty method source(in this case IceTv) the entry from the empty specified source should be droped regiardless.

But thats just the way I thought it would work of cause if each entry were flaged to indicate the type of merge method specified for the source. The overlap detection could take that into account.

The whole merging data from multiple sources that have inherent degrees of inaccracy and conflict is always a major hassle particularly if there is no true way to specify a relative level of confidence for of the individual sources. (GIGO) blink.gif

never mind I don't even watch tennis. rolleyes.gif


Oh by the way the prereplace and postreplace are great. Thanks for those.

Thanks again for your hard work.

Rick







Go to the top of the page
 
+Quote Post
Calvi
post Jan 31 2011, 05:53 PM
Post #317


Forum Regular


Group: Members
Posts: 829
Joined: 8-November 04
Member No.: 1,988
Card: DVICO FusionHDTV


I don't disagree,

its just the method is to check a dictionary key so I would have to check the entire time range (this would be slow).

I will ponder the above as I suspect it not too hard to add some options to the overlap checking to deal with this issue,
ie if I store the guide type (subtitle, empty etc) the overlap detection could prioritise by type then length for example.
(similar to how it will not remove a matchlist item even if another item is longer)

This post has been edited by Calvi: Jan 31 2011, 05:54 PM
Go to the top of the page
 
+Quote Post
Ratall
post Jan 31 2011, 11:34 PM
Post #318


Participant


Group: New Members
Posts: 24
Joined: 24-October 09
From: Australia
Member No.: 12,668
Card: DNTV TinyTwin USB


Calvi,
2 quiick question is the integrity of the XML data checked after prereplace or only after the postreplace.
and canyou translat this
CODE
^(.*)(\r?\n\1)+$::$1

from the postreplace its got me a little puzzled
Rick
Go to the top of the page
 
+Quote Post
Calvi
post Feb 1 2011, 08:15 AM
Post #319


Forum Regular


Group: Members
Posts: 829
Joined: 8-November 04
Member No.: 1,988
Card: DVICO FusionHDTV


Rick,


1. PreReplace
a. The xml is loaded, on load the msxml is validated.
b. The pre-replace is executed on the entire xml document.
c. The resulting xml is then re validated using the msxml parser.
d. If the parser fails the original un modified xml is used instead (and an error reported).
2. PostReplace
a. The post-replace is executed on each programme element.
b. Each programme element is then verified to contain start, stop, channel and title.
c. If verify fails the programme is *omitted and an error logged

This allows the prereplace to be quick (as it needs to be applied to each source) and the post replace to be thorough, a faulty regex would then only damage some programs.

*Note to self for next version, I really should revert to the undamaged programme on verification as well (rather than omit it).


CODE
^(.*)(\r?\n\1)+$::$1


This regex searches for duplicate lines and removes them.

Duplicates are common when you are consolidating categories eg..

CODE
<category>News</category>
<category>Current Affairs</category>


would become

CODE
<category>News</category>
<category>News</category>


the regex replaces two repeated lines with the first match to become

CODE
<category>News</category>



This was a cheap way to do it as I didn't have to write any special code to check for duplicates.

This post has been edited by Calvi: Feb 1 2011, 08:19 AM
Go to the top of the page
 
+Quote Post
Ratall
post Feb 1 2011, 09:11 PM
Post #320


Participant


Group: New Members
Posts: 24
Joined: 24-October 09
From: Australia
Member No.: 12,668
Card: DNTV TinyTwin USB


QUOTE (Calvi @ Feb 1 2011, 09:15 AM) *
CODE
^(.*)(\r?\n\1)+$::$1


This regex searches for duplicate lines and removes them.


This was a cheap way to do it as I didn't have to write any special code to check for duplicates.


Calvi,
Problems with the above

1) not all sources put categories on seperate lines

CODE
<category>Cartoon</category><category>Anime</category>


I think would become
CODE
<category>Animation</category><category>Animation</category>



2) what if there is an intervining catagory ?

CODE
<category>Cartoon</category>
<category>Childrens</category>
<category>Anime</category>



I think would become

CODE
<category>Animation</category>
<category>Children</category>
<category>Animation</category>


Unless regexp is alot more powerfull than I think you might have to look at coding it.


I still don't get how the code works

I figure
CODE
^(.*)

grabes each line ninus the carrage return (newline) combo
I think I under stand
CODE
/r?/n



I think the
CODE
::$1


just outputs the stuff grabbed in the 1st brace set

What I don't get is

CODE
\1)+$


I suspect I'm going braindead in my old age.
The stuff I've read about regexp on the web just gives me a headache and a worrying numbness down my left side.

anyway thanks for answering so quick

Rick
Go to the top of the page
 
+Quote Post

18 Pages V  « < 14 15 16 17 18 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 18th June 2013 - 03:51 PM