Универсальный парсер 0.7.2

Ress Kent · 22 Сен 2015

От ненужных ссылок по тегам рутора на жанры игр, музыки и т.д.

PHP:

$text = preg_replace("#<a href=\"/tag/(.*?)/(.*?)\" target=\"_blank\">(.*?)</a>#si", "\\2", $text);

whuru · 7 Окт 2015

Не парсит с рутора и с нонейма. С рутрекера всё нормально. При попытке парсинга с рутора вываливает нотисы:

Код:

Notice: Undefined offset: 1 in /var/www/library/includes/functions_parser.php on line 71

Notice: Undefined offset: 1 in /var/www/library/includes/functions_parser.php on line 125

PHP:

function parse_rutor($url, $gettorrent)
{
   $ch = curl_init($url);
   curl_setopt($ch, CURLOPT_USERAGENT, 'IE20');
   curl_setopt($ch, CURLOPT_HEADER, 0);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, '1');
   $copy_page = curl_exec($ch);
   curl_close($ch);
   
   preg_match("#\<table id=\"details\"\>.*?<br />(.*?)\<tr\>\<td class=\"header\"\>#si", $copy_page, $copy_post);
   preg_match ("#<title>rutor.org :: (.*?)</title>#si", $copy_page, $r_title);

71.   $copy_release = $copy_post[1];

   $text = preg_replace("#<a href=\"(.*?)\".*?>(.*?)</a>#si", "[url=\\1]\\2[/url]", $copy_release);
   $text = preg_replace("#<img src=\"(\S*?)\" style=\"float:(.*?);\" />#si", "[img=\\2]\\1[/img]", $text);
   $text = preg_replace("#<img src=\"(\S*?)\" />#si", "[img]\\1[/img]", $text);
   $text = preg_replace("#<hr />#si", "[hr]", $text);

   //hide
   while (preg_match("#<div class=\"hidewrap\">.*?this\)\)\">(.*?)</div>.*?<textarea class=\"hidearea\">(.*?)</textarea></div>#sie", $text, $match))
   {
     $replace = "[spoiler=\"".strip_tags($match[1])."\"]".$match[2]."[/spoiler]";
     $search = "|".preg_quote($match[0])."|si";
     $text = preg_replace($search, $replace, $text);
   }

PHP:

    if ($gettorrent === 1) {
        preg_match ("#<a href=\"(http:\/\/d\.rutor\.org\/download\/\d+)\"#si", $copy_page, $r_torrent);
        $torrent_url = $r_torrent[1];
        $options_torrent = array(CURLOPT_URL => $torrent_url);
        $torrent_hidden = get_torrent($options_torrent);
    }else {
        $torrent_hidden = '';
    }

125.    $pars_data = array("title" => $r_title[1], "bbcode" => strip_tags($text), "hidden" => $torrent_hidden);
    return $pars_data;

}

Движок 2.1.5 чистый

Nikita11 · 9 Окт 2015

whuru написал(а):

Не парсит с рутора и с нонейма. С рутрекера всё нормально. При попытке парсинга с рутора вываливает нотисы:

Код:

Notice: Undefined offset: 1 in /var/www/library/includes/functions_parser.php on line 71

Notice: Undefined offset: 1 in /var/www/library/includes/functions_parser.php on line 125

PHP:

function parse_rutor($url, $gettorrent)
{
   $ch = curl_init($url);
   curl_setopt($ch, CURLOPT_USERAGENT, 'IE20');
   curl_setopt($ch, CURLOPT_HEADER, 0);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, '1');
   $copy_page = curl_exec($ch);
   curl_close($ch);
  
   preg_match("#\<table id=\"details\"\>.*?<br />(.*?)\<tr\>\<td class=\"header\"\>#si", $copy_page, $copy_post);
   preg_match ("#<title>rutor.org :: (.*?)</title>#si", $copy_page, $r_title);

71.   $copy_release = $copy_post[1];

   $text = preg_replace("#<a href=\"(.*?)\".*?>(.*?)</a>#si", "[url=\\1]\\2[/url]", $copy_release);
   $text = preg_replace("#<img src=\"(\S*?)\" style=\"float:(.*?);\" />#si", "[img=\\2]\\1[/img]", $text);
   $text = preg_replace("#<img src=\"(\S*?)\" />#si", "[img]\\1[/img]", $text);
   $text = preg_replace("#<hr />#si", "[hr]", $text);

   //hide
   while (preg_match("#<div class=\"hidewrap\">.*?this\)\)\">(.*?)</div>.*?<textarea class=\"hidearea\">(.*?)</textarea></div>#sie", $text, $match))
   {
     $replace = "[spoiler=\"".strip_tags($match[1])."\"]".$match[2]."[/spoiler]";
     $search = "|".preg_quote($match[0])."|si";
     $text = preg_replace($search, $replace, $text);
   }

PHP:

    if ($gettorrent === 1) {
        preg_match ("#<a href=\"(http:\/\/d\.rutor\.org\/download\/\d+)\"#si", $copy_page, $r_torrent);
        $torrent_url = $r_torrent[1];
        $options_torrent = array(CURLOPT_URL => $torrent_url);
        $torrent_hidden = get_torrent($options_torrent);
    }else {
        $torrent_hidden = '';
    }

125.    $pars_data = array("title" => $r_title[1], "bbcode" => strip_tags($text), "hidden" => $torrent_hidden);
    return $pars_data;

}

Движок 2.1.5 чистый

Знаешь я тоже долго мучилься с аналогчной проблемой. Могу предложить как вариант перенастроить на крутор или фрутор с них вместе с файлом парситься.

whuru · 10 Окт 2015

Так и с бабочки тоже не парсит. Подобные нотисы, только в других строках соответственно.

valera22 · 11 Окт 2015

Всем привет вопрос такой а он парсит только по одному торренту? не льзя выставить что бы парсил допустим со страницы 1 до страцицы 3?

Ragnar · 11 Окт 2015

valera22 написал(а):
Всем привет вопрос такой а он парсит только по одному торренту? не льзя выставить что бы парсил допустим со страницы 1 до страцицы 3?

Здесь есть автопарсеры.

whuru · 11 Окт 2015

Опа. Посты пропали про модернизацию мода. Что случилось? Только хотел отписаться по поводу последних изменений.

valera22 · 12 Окт 2015

скинь темку плиз а то я не нашол

Ress Kent · 17 Окт 2015

Для удаления релиз групп рутора
Например:
Ботаны [01x01-08 из 60] (Ай Ти Рота) (2015) SATRip от Files-x
Полиция Чикаго / Chicago P.D. [03х01-03 из 15] (2015) WEB-DL 1080p от MegaPeer | Шадинский
Вместе с дельфинами [01х03] (2015) HDTV 1080i от GeneralFilm
Заменится на:
Ботаны [01x01-08 из 60] (Ай Ти Рота) (2015) SATRip
Полиция Чикаго / Chicago P.D. [03х01-03 из 15] (2015) WEB-DL 1080p
Вместе с дельфинами [01х03] (2015) HDTV 1080i

Все что после слова " от" вместе с " от" будет удалено

Найти
preg_match ("#<title>rutor.org :: (.*?)</title>#si", $copy_page, $r_title);
Вставить ниже
$r_title = preg_replace("#(.*?) от.*#si", "\\1", $r_title);

Или более точечный фильтр релиз групп

$r_title = str_replace(array(' от ', 'GeneralFilm', 'Files-x', 'Scarabey'), '', $r_title);

Добавить дальше на свое усмотрение

Ress Kent · 17 Окт 2015

Добавил все основные на руторе

PHP:

    $r_title = str_replace(array(' от ', 'GeneralFilm', 'Files-x', 'Scarabey', 'MediaClub', 'R.G. HD-Films', 'HQ-ViDEO', 'HELLYWOOD', 'HQCLUB', 'ExKinoRay', 'Generalfilm', 'NovaLan', 'New-Team', '-=HD-NET=-', 'qqss44', 'Leonardo', 'HDReactor', 'torrentfilm'), '', $r_title);