Troubleshooting

There are several test and debug modes for troubleshooting problems with the Feeder script, with certain feeds or monitoring the Feeder operation. These troubleshooting options are not meant for testing filter and URL configuration for specific feeds. If you want to do this, look on the Testing if you get the desired content page (and the test_url.php and test_feed.php scripts).

config.ini:

testing = 1 (0 or 1)
debuglevel = 2 (0 - 7)

testing = 1 is the setup default, so you cannot accidentally hose your setup when starting.
Set testing = 0 when you want to go live.

Testing doesn't actually get the articles, it only processes the feed and shows which articles it would fetch in this run. Testing means that no data is written. There are no updates to data.ini und no .dat files are written to the import temp directory. And import will not be run, of course.

This also means that the "Last run" date will be "wrong" insofar that it reflects only the date the Feeder was run in normal mode last time. So, don't get confused by this!

Debug output (debuglevel) is on no matter if testing or not. debuglevel = 0 will not generate any debug output (as will using php-win.exe).

At the moment there are up to 6 debug levels. For normal operation keep debuglevel at the level you want to have all the time. (If run via VA or scheduled this won't add any output overhead since we use php-win.exe for that.)

There is a special testing mode for the command line:

php feeder.php 6 test feed id 6 (see here for detailed id explanation)
php feeder.php test test all feeds

Both automatically set debuglevel to 10, are a one-time option and run the Feeder in testing mode (no matter what is set in config.ini). So, it's meant for testing without having to change config.ini.

The command-line option is especially meant for troubleshooting a problem with a specific feed, like why it suddenly stopped getting new articles or so. (If you want to test filters and content extraction you better use the provided test scripts.)

In case you mistype the option the Feeder will go in test mode and try to test that feed and fail and stop. In case there's anything that resembles "test" (like "mytest", "testing" "test 6" and so on) the feeder will go in the "test all feeds" mode. So, whatever you type as a command line option the Feeder will always go into test mode and not accidentally hose your current settings.

For full debug levels you get an output similar to this for each article currently available in the feed:

Feed Microsoft IEBlog (9)
15: The Countdown to Mix07 Has Started!
  new: 1 - rss2: Mon, 26 Feb 2007 23:39:00 GMT
   Dates: 1172533140 1172533139 1172533139
      link: URL

Explanation:

Feed Microsoft IEBlog (9) feed name and (feed id)
15: The Countdown to Mix07 Has Started! numbercount of the article and its title
new: 1 detected as new article (0 = old article)
rss2: Mon, 26 Feb 2007 23:39:00 GMT version as defined in feeds.ini and date of this article in RFC2822 format
Dates: date of this article - date_last_article - date_last_checked Time in Unix timestamps.
link: URL article URL (no "fetch" indicates we don't want to fetch it)
fetch link: URL article URL (fetch indicates that we want to fetch the article)



date of this article: date as announced by the feed (so, the RFC2822 date one line above and this date are identical, just different formats)

date_last_checked: the date the Feeder ran last and checked this feed (disabled feeds will show the date they were last actively checked, not the date the Feeder last ran) This was the date value relevant for determining if an article is new or not in version 0.91.

date_last_article: the date of the "newest" article in this feed before starting this run. Starting with version 0.92 this value gets used for determining if an article is new or not. This value is the same as "date last checked" if the feed doesn't provide dates.

date_last_checked and date_last_article are stored in data.ini.

For a normal, regularly fetched feed the date_last_article (second value) should be the same as the date of the last article fetched. So, the first and second date should be the same for the last article of this feed you see in the debug output. The second and third date value in the Dates: line should always be the same for all articles of a feed. The third value (date_last_checked) should be the same for all articles and feeds.

More info on troubleshooting feeds that stopped fetching new articles can be found in the readme for data.ini.

Finally, if you have a feed where you regularly get old articles presented as "new" this is not necessarily our fault!
It's more likely that either:

  • The creation of the feed list is somewhat borked at the feed provider.
    This can happen with "no date" feeds where the provider doesn't care about the order of appearance or changes it or their database backend is somewhat hosed and orders the articles in no particular order. We can't do anything about this. A typical broken feed of this kind are the KB articles feeds by Microsoft. version_date = "none" should cope with this as of version 0.92 of the Feeder.
  • The provider of the feed regularly updates the publishing date of articles although they did not change.
    This can only happen with feeds that provide dates. Some providers seem to have publishing software that changes the publishing date of an article each time something is changed in the record of this article. Or maybe they do it intentionally. Don't know. But if a date changes we have to rely on that announcement and assume the article got updated with new information and so fetch it again. There is no good way to check *reliably* if an article really got something new, so I don't even attempt that. We have to live with that. Heise typically does this in their feeds.

Both of these problems can easily be checked by looking at the feed with your browser.

If you want, complain in both cases to the provider of the feed. In case of "no date" feeds ask them to provide a feed with date. There is no good reason not to provide a date. Telling that RSS 0.9 (which doesn't know dates) is the widest spread format and that's why they use it is bullshit. Any decent RSS reader knows at least RSS 1.0.

Short description of debug levels

Each debug level basically adds another line with additional information.

0no output
1outputs only the Feed name
2adds article names
3adds "new-ness" status of article and date formats
4adds Dates line
5adds URL of the article
6adds filter usage/success
7adds name of the file we write the article to
 
vaosfeeds/troubleshooting.txt · Last modified: 29.06.2007 00:31 by kai
 
Recent changes RSS feed Driven by DokuWiki